首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
信息搜寻中用户查询重构研究综述   总被引:1,自引:0,他引:1  
李纲  胡蓉 《图书情报工作》2014,58(11):123-129
基于信息搜寻中人机交互行为,从查询重构类型与模式、查询重构绩效、查询重构影响因素及查询式扩展技术4个方面综述国内外关于用户查询重构的研究。得出结论:基于查询重构模式研究,可获悉用户偏向使用的查询重构序列;结合查询重构影响因素和重构序列,可向不同群体用户推荐高概率的查询词,但该研究也存在一定的局限;尽管从检索系统角度,查询重构得到不少学者的广泛关注,但关于用户如何重构查询的研究在中文文献中尚未见到。  相似文献   

2.
[目的/意义] 探讨高校图书馆用户在使用图书馆OPAC系统查找相关资源时调整提问的行为模式。[方法/过程] 以北京师范大学图书馆OPAC日志数据为对象,采用S.Y.Rieh与Xie Hong提出的提问调整模式类型,利用内容分析法对提问日志进行内容编码和统计分析。[结果/结论] 高校图书馆用户的OPAC提问调整基本模式与网络信息检索提问调整模式基本一致,并且,在动态调整模式过程中,还可以细化为直线、阶梯、锯齿、凹凸、循环等子模式。针对如何优化OPAC系统和提升用户信息素养提出若干建议。  相似文献   

3.
Query recommendation has long been considered a key feature of search engines, which can improve users’ search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user’s initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users’ information needs. For this purpose, we attempt to infer query utility from users’ sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.  相似文献   

4.
老年人网络健康信息检索行为实验研究   总被引:7,自引:2,他引:5  
吴丹  李一喆 《图书情报工作》2014,58(12):102-108
采用用户检索实验的方法,从检索工具选择、检索式构造、检索结果浏览和选择4个阶段分析老年人网络健康信息检索的行为特点、行为模式以及网络健康信息对老年人日常信息获取的影响。研究结果发现:老年人在检索过程中表现出明显的依赖性和定势性。首页/重选网页模式、跟随链接模式、重构检索式模式是老年人的高频检索行为模式。健康状况、网络熟悉程度和网络健康信息的可信度是老年人日常利用网络获取健康信息时考虑的主要因素。  相似文献   

5.
搜索引擎中Robot搜索算法的优化   总被引:15,自引:0,他引:15  
目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 ,本文的优化算法可以较大程度地改进搜索引擎的整体性能  相似文献   

6.
Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain.  相似文献   

7.
文章从提高科技文献检索质量的视角出发,提出基于本体的科技文献检索框架,就本体构建、文献语义空间、查询请求重构、检索过程等方面进行研究,并给出关键算法。指出本检索框架与现有研究相比,主要特征包括:基于规则自动生成文献资源的语义扩展模型;构造“特征词汇-文献-概念”三层子网结构的文献信息空间;引入用户兴趣模型,强调有关用户的这些知识将对新的检索策略的产生和发展产生影响。  相似文献   

8.
针对语义检索在实际应用中面临的用户查询意图获取困难、潜在语义索引计算复杂、领域本体覆盖范围小、概念语义类型不丰富、自动化程度低等问题,提出基于WordNet和SUMO本体集成的自动语义检索及可视化模型。实验表明这种模型能够过滤掉大量与用户查询无关的信息,提高信息检索系统的检准率,并很好地满足用户可视化和个性化检索需求。  相似文献   

9.
The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.  相似文献   

10.
11.
刘畅  宋筱璇 《图书情报工作》2017,61(16):122-134
[目的/意义]用户的检索式行为是用户信息搜索的重要环节,本文拟通过综述的形式对这些研究进行梳理,形成较为完整的综述。[方法/过程]通过对国内外相关文献的梳理,将检索式构建行为划分为检索词、检索式和会话层三个层面,以及词汇、语法和语义三个维度,对每个维度及不同维度之间的研究的区别与联系进行讨论,并对检索式的重构行为、检索式的质量和效果评估、以及影响用户检索式行为的要素等几个方面的相关研究进行总结。[结果/结论]已有研究对于检索式行为中的检索词和检索式的词汇研究较为丰富,未来需要增加对检索式的语法和语义的研究,以便深入理解用户的检索式构成特征。另外,关于检索式重构的类别和模式的自动识别的探索有所不足。在检索式的质量和效果评估方面,还需结合多种情境因素,更深入地研究易于用户理解和利于其搜索的检索式推荐模式。  相似文献   

12.
Mobile Agents for Distributed and Heterogeneous Information Retrieval   总被引:1,自引:0,他引:1  
The heterogeneous, distributed and voluminous nature of many government and corporate data sources impose severe constraints on meeting the diverse requirements of users who analyze the data. Additionally, communication bandwidth limitations, time constraints, and multiple data formats impose further restrictions on users of these distributed data sources. In this paper, we present an Agent-based Complex QUerying and Information Retrieval Engine (ACQUIRE) for large, heterogeneous, and distributed data sources. ACQUIRE acts as a softbot or interface agent by presenting users with a view of a single, unified, homogenous data source, against which users can pose high-level declarative queries. ACQUIRE translates each such user query into a set of sub-queries by employing a combination of planning and traditional database query optimization techniques. ACQUIRE then spawns a set of mobile agents corresponding to these sub-queries, which in turn retrieve the data from various distributed data sources by dynamically optimizing the retrieval strategy as it is carried out. These mobile agents carry with them data-processing code that can be executed at the remote site, thus reducing the size of data returned by the agent. When all mobile agents have returned, ACQUIRE filters and merges the retrieved data and presents the results to the user. While the system is still very much a work in progress, current validation experiments on simulated NASA Distributed Active Archive Centers (DAACs) have demonstrated that complex queries can be effectively decomposed and retrieved by this approach.  相似文献   

13.
[目的/意义] 揭示移动图书馆用户的查询式构造行为特征,并为移动图书馆的检索功能改进提出建议。[方法/过程] 采用系统日志挖掘法,根据某高校移动图书馆为期一个月的用户日志,通过统计分析方法,利用互信息值、查询式多样性、查询式丰富性、学科分布、持续时间等指标考察移动图书馆用户的查询式关联性、查询重构模式、查询式主题等方面。[结果/结论] 移动图书馆用户的查询式互信息值普遍较低,即查询式在内容上的关联性较弱;重复模式和直线模式是最常见的重构模式,即移动图书馆用户反复搜索同一查询式;移动图书馆用户的搜索兴趣集中在人文社科领域,用户对相同主题查询式的搜索行为具有持续性。建议增加查询推荐功能、自动纠错功能和高级检索功能,以提高移动图书馆检索服务的查全率和查准率。  相似文献   

14.
[目的/意义]了解、分析和识别用户学术搜索时所表达的信息需求是优化查询结果、提高学术搜索引擎用户体验的首要步骤,而用户进行学术搜索时通过查询表达式所表达的用户表意信息需求及潜在信息需求可称之为学术查询意图。本文总结学术查询意图类目体系有助于学术查询意图识别和检索结果页面的呈现。[方法/过程]在A.Broder的查询意图类目体系的基础上,结合百度学术搜索查询日志中查询表达式实例,构建学术查询意图的类目体系。以此为基础,总结不同类别的学术查询意图,并分析不同类别学术查询意图下查询表达式的特点。[结果/结论]学术查询意图主要分为学术文献类、学术实体类、学术探索类、知识问答类和非学术文献类五大类;得出不同类别学术查询意图在学术搜索中的大致比例;给出每类学术查询意图的查询表达式特征、查询情景和查询结果页。  相似文献   

15.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

16.
苏颖 《情报工程》2015,1(5):008-017
专利检索是一个非常复杂的过程,用户为了迅速高效地完成检索任务需要得到支持。专利检索过程的许多环节可以借助一些工具完成,其中就包括查询(式)构造工具。查询构造是一项高度依赖人工的任务,工具只能实现对可能有用数据进行预先计算,并针对用户进行可视化。信息检索系统中,查询过程和查询结果可视化的方式有很多。本研究提出了两种典型的原型系统设计,用于在专利检索过程中对不同的查询表达式进行比较。原型包含查询表达式构造因素和结果集大小因素,两种因素对于专利领域专家探究查询表达式的调整对检索效率的影响至关重要。本文开发的系统有助于在专利检索过程中对复杂查询表达式进行逐步优化,系统设计思想基于了领域专家型知识工程。  相似文献   

17.
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on “expanding” the document representation, while passage-based approaches can be thought of as utilizing a “contracted” document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.  相似文献   

18.
王仁武  袁毅 《图书馆论坛》2011,31(4):100-102
用户访问行为信息记录在Web日志中,通过对海量Web日志进行清洗、抽取和加载来构建用户行为数据仓库,并结合文章所提出的用户访问路径概率矩阵模型进行数据挖掘,可以实现智能化的用户行为监控,可以为用户提供及时优质的信息服务。  相似文献   

19.
Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.  相似文献   

20.
基于伪相关反馈的跨语言查询扩展   总被引:3,自引:2,他引:1  
相关反馈是一种重要的查询重构技术,本文分析了两类相关反馈技术,一是按用户是否参与可分为伪相关反馈和交互式相关反馈,二是按作用于查询的方式可分为查询扩展与检索词重新加权.在此基础上,本文重点探讨了将相关反馈技术应用于跨语言信息检索,提出了翻译前查询扩展、翻译后查询扩展、翻译前与翻译后相结合的查询扩展三种方法.最后,本文通过伪相关反馈实验对这三种方法进行了比较,实验结果显示,三种跨语言查询扩展方法都能够有效地提高检索结果的精度,其中翻译后查询扩展方法相对更优越.此外,查询式的长度对不同跨语言查询扩展方法产生着不同程度的影响.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号