首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As the volume and variety of information sources continues to grow, there is increasing difficulty with respect to obtaining information that accurately matches user information needs. A number of factors affect information retrieval effectiveness (the accuracy of matching user information needs against the retrieved information). First, users often do not present search queries in the form that optimally represents their information need. Second, the measure of a document’s relevance is often highly subjective between different users. Third, information sources might contain heterogeneous documents, in multiple formats and the representation of documents is not unified. This paper discusses an approach for improvement of information retrieval effectiveness from document databases. It is proposed that retrieval effectiveness can be improved by applying computational intelligence techniques for modelling information needs, through interactive reinforcement learning. The method combines qualitative (subjective) user relevance feedback with quantitative (algorithmic) measures of the relevance of retrieved documents. An information retrieval is developed whose retrieval effectiveness is evaluated using traditional precision and recall.  相似文献   

2.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

3.
Individual differences have long been of interest in information science as they bear on the design of information systems and services for specific populations. Yet little is known as to which individual differences make a difference to search outcomes, both across applications and for diverse user groups. A scoping study of information seeking and retrieval research from 2000 to 2015 was conducted. Over 2100 articles retrieved from eight scholarly databases were screened based on title, abstract, and full-text (using specified inclusion criteria), resulting in 223 papers for analysis. Data were extracted to provide an overview of the literature, including types of individual differences studied, publication volume over time, measures, samples, and study outcomes. Findings are inconclusiveness regarding how individual differences affect search outcomes, and raise issues around measurement and generalizability. This study represents an essential first step to developing a more systematic investigation of individual differences research and connecting individual research studies to anchor and guide future work.  相似文献   

4.
The problem of language in Web searching has been discussed primarily in the area of cross-language information retrieval (CLIR). However, much CLIR research centers on investigation of the effectiveness of automatic translation techniques. The case study reported here explored bilingual user behaviors, perceptions, and preferences with respect to the capability of the Web as a multilingual information resource. Twenty-eight bilingual academic users from Myongji University in Korea were recruited for the study. Findings show that the subjects did not use Web search engines as multilingual tools. For search queries, they selected a language that represents their information need most accurately depending on the types of information task rather than choosing their first language. Subjects expressed concerns about the accuracy of machine translation of scholarly terminologies and preferred to have user control over multilingual Web searches.  相似文献   

5.
国外移动视觉搜索研究述评   总被引:1,自引:0,他引:1  
移动视觉搜索(MVS)作为一种重要的信息获取方式,已成为信息检索领域的前沿课题。目前学界对于MVS的研究方法主要有模拟仿真法、比较分析法、文献研究法、跨学科研究法、实地调查法等。MVS的出现将影响知识交互和知识服务模式,影响搜索引擎市场份额,并催生新型产业链及产业集群。MVS可分为标准架构、本地化架构和混合架构,涉及描述符处理技术、视觉对象对匹配技术、视觉对象检索流程、视觉对象知识库建设等关键技术。当前主要技术瓶颈有:软硬件资源的匹配问题,视觉查询多样性与MVS服务、应用的自适应问题,MVS搜索性能与用户体验效果的匹配问题,多样化移动视觉服务、应用与异构MVS系统之间的互操作问题。图情工作者应重点关注以下内容:支持MVS的信息检索模式,视觉对象知识库建设,MVS系统及视觉资源标准化,MVS应用分析及决策支持,MVS开发、应用及管理人才培养。图3。表1。参考文献69。  相似文献   

6.
An information retrieval (IR) system can often fail to retrieve relevant documents due to the incomplete specification of information need in the user’s query. Pseudo-relevance feedback (PRF) aims to improve IR effectiveness by exploiting potentially relevant aspects of the information need present in the documents retrieved in an initial search. Standard PRF approaches utilize the information contained in these top ranked documents from the initial search with the assumption that documents as a whole are relevant to the information need. However, in practice, documents are often multi-topical where only a portion of the documents may be relevant to the query. In this situation, exploitation of the topical composition of the top ranked documents, estimated with statistical topic modeling based approaches, can potentially be a useful cue to improve PRF effectiveness. The key idea behind our PRF method is to use the term-topic and the document-topic distributions obtained from topic modeling over the set of top ranked documents to re-rank the initially retrieved documents. The objective is to improve the ranks of documents that are primarily composed of the relevant topics expressed in the information need of the query. Our RF model can further be improved by making use of non-parametric topic modeling, where the number of topics can grow according to the document contents, thus giving the RF model the capability to adjust the number of topics based on the content of the top ranked documents. We empirically validate our topic model based RF approach on two document collections of diverse length and topical composition characteristics: (1) ad-hoc retrieval using the TREC 6-8 and the TREC Robust ’04 dataset, and (2) tweet retrieval using the TREC Microblog ’11 dataset. Results indicate that our proposed approach increases MAP by up to 9% in comparison to the results obtained with an LDA based language model (for initial retrieval) coupled with the relevance model (for feedback). Moreover, the non-parametric version of our proposed approach is shown to be more effective than its parametric counterpart due to its advantage of adapting the number of topics, improving results by up to 5.6% of MAP compared to the parametric version.  相似文献   

7.
刘宁  柴雅凌 《图书馆杂志》2005,24(10):47-51
本文主要结合自然语言在国内外的应用现状,分析了自然语言检索的发展趋势,同时对自然语言标引技术和处理方法进行了探讨,阐述了自然语言智能检索原理及其在智能搜索引擎中的应用,并就其在智能检索中的应用改进提出了自己的一点看法,预见了第三代搜索引擎的自然语言化是一种必然趋势。  相似文献   

8.
搜索引擎中Robot搜索算法的优化   总被引:15,自引:0,他引:15  
目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 ,本文的优化算法可以较大程度地改进搜索引擎的整体性能  相似文献   

9.
问答式信息检索是新一代搜索引擎,它接收自然语言描述的问题,在文档集合中搜索并返回问题的精确答案.问答式信息检索中,检索模块性能的提高将直接影响问题回答系统的整体性能.本文研究系统中的查询优化技术,包括两种策略:基于模式知识库的查询优化;挖掘Web语义蕴含信息,构建查询扩展资源.本文利用TREC提供的问题集与答案集(TREC8-TREC13)做实验来测试查询优化方法的性能,实验结果表明,相对于传统的查询生成,本文采用的查询优化技术在检索精度上取得了提高,t-test结果证明,系统性能提高统计显著.  相似文献   

10.
11.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

12.
文章结合自然语言在国内外的应用现状。分析了自然语言检索的发展趋势,同时对自然语言标引技术和处理方法进行了探讨,阐述了自然语言智能检索原理及其在智能搜索引攀中的应用,并就其在智能检索中的应用改进提出了意见,认为,第三代搜索引擎的自然语言化是一种必然趋势。  相似文献   

13.
《期刊图书馆员》2013,64(3):183-191
SUMMARY

RSS technology is growing in popularity among libraries as a way to distribute, or syndicate, information about new electronic resources and Web content to users. “Really Simple Syndication” is an effective communication tool for libraries because it supplies the user with to up-to-date links and announcements on the library Web site after only one initial setup function. RSS does not require the user to make frequent visits to the library Web site for updated information; rather, it gathers content from any Web sites designated by the user, and delivers news to the users in an aggregated format. The benefits of RSS are that the software to setup the service is often free for downloading and many users are already familiar with the application. The “orange button” now present on so many commercial Web sites ranging from news sites to blogs is gaining a presence on library Web sites.  相似文献   

14.
智能搜索引擎信息过滤机制研究   总被引:3,自引:0,他引:3  
智能搜索引擎是人工智能技术和传统搜索引擎技术相结合的产物。面对信息无时无刻不在进行更替的网络环境,智能搜索引擎具有自然语言过滤智能化、多文档处理智能化、用户服务智能化等信息处理机制。为促进智能搜索引擎发展,应重视用户建模技术研究,加强基于多Agent智能搜索引擎系统的研制与实践,加大智能搜索引擎关键技术研究力度。  相似文献   

15.
This paper presents a Graph Inference retrieval model that integrates structured knowledge resources, statistical information retrieval methods and inference in a unified framework. Key components of the model are a graph-based representation of the corpus and retrieval driven by an inference mechanism achieved as a traversal over the graph. The model is proposed to tackle the semantic gap problem—the mismatch between the raw data and the way a human being interprets it. We break down the semantic gap problem into five core issues, each requiring a specific type of inference in order to be overcome. Our model and evaluation is applied to the medical domain because search within this domain is particularly challenging and, as we show, often requires inference. In addition, this domain features both structured knowledge resources as well as unstructured text. Our evaluation shows that inference can be effective, retrieving many new relevant documents that are not retrieved by state-of-the-art information retrieval models. We show that many retrieved documents were not pooled by keyword-based search methods, prompting us to perform additional relevance assessment on these new documents. A third of the newly retrieved documents judged were found to be relevant. Our analysis provides a thorough understanding of when and how to apply inference for retrieval, including a categorisation of queries according to the effect of inference. The inference mechanism promoted recall by retrieving new relevant documents not found by previous keyword-based approaches. In addition, it promoted precision by an effective reranking of documents. When inference is used, performance gains can generally be expected on hard queries. However, inference should not be applied universally: for easy, unambiguous queries and queries with few relevant documents, inference did adversely affect effectiveness. These conclusions reflect the fact that for retrieval as inference to be effective, a careful balancing act is involved. Finally, although the Graph Inference model is developed and applied to medical search, it is a general retrieval model applicable to other areas such as web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.  相似文献   

16.
基于Ontology的个性化检索   总被引:4,自引:0,他引:4  
目前检索工具的设计大都面向所有用户,而不考虑用户个人的特殊信息需求。本文提出一种基于Ontology的个性化检索方法,该方法自动学习用户查询的历史记录,构建用户兴趣模型,以此推导用户新提问的真正意图,满足用户特殊的信息需求。该方法适用于Internet特定领域或者特定用户群、企业网等智能信息检索。  相似文献   

17.
Ensuring quick and consistent access to large collections of unstructured documents is one of the biggest challenges facing knowledge-intensive organizations. Designing specific vocabularies to index and retrieve documents is often deemed too expensive, full-text search being preferred despite its known limitations. However, the process of creating controlled vocabularies can be partly automated thanks to natural language processing and machine learning techniques. With a case study from the biopharmaceutical industry, we demonstrate how small organizations can use an automated workflow in order to create a controlled vocabulary to index unstructured documents in a semantically meaningful way.  相似文献   

18.
Among the huge maze of resources available on the Internet, UnCoverWeb stands out as a valuable tool for medical libraries. This up-to-date, free-access, multidisciplinary database of periodical references is searched through an easy-to-learn graphical user interface that is a welcome improvement over the telnet version. This article reviews the basic and advanced search techniques for UnCoverWeb, as well as providing information on the document delivery functions and table of contents alerting service called Reveal. UnCover's currency is evaluated and compared with other current awareness resources. System deficiencies are discussed, with the conclusion that although UnCoverWeb lacks the sophisticated features of many commercial database search services, it is nonetheless a useful addition to the repertoire of information sources available in a library.  相似文献   

19.
网络环境下信息存储与检索技术的发展   总被引:7,自引:0,他引:7  
信息存储与检索技术是信息传递中的重要环节。检索语言和检索效率密切相关,它在信息检索过程中起着语言保障的作用。为满足不同用户能够检索到所需要的信息,检索语言必然朝着自然语言、用户界面友好的方向发展。  相似文献   

20.
Two-stage statistical language models for text database selection   总被引:2,自引:0,他引:2  
As the number and diversity of distributed Web databases on the Internet exponentially increase, it is difficult for user to know which databases are appropriate to search. Given database language models that describe the content of each database, database selection services can provide assistance in locating databases relevant to the information needs of users. In this paper, we propose a database selection approach based on statistical language modeling. The basic idea behind the approach is that, for databases that are categorized into a topic hierarchy, individual language models are estimated at different search stages, and then the databases are ranked by the similarity to the query according to the estimated language model. Two-stage smoothed language models are presented to circumvent inaccuracy due to word sparseness. Experimental results demonstrate that such a language modeling approach is competitive with current state-of-the-art database selection approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号