首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
It is well-known that relevance feedback is a method significant in improving the effectiveness of information retrieval systems. Improving effectiveness is important since these information retrieval systems must gain access to large document collections distributed over different distant sites. As a consequence, efforts to retrieve relevant documents have become significantly greater. Relevance feedback can be viewed as an aid to the information retrieval task. In this paper, a relevance feedback strategy is presented. The strategy is based on back-propagation of the relevance of retrieved documents using an algorithm developed in a neural approach. This paper describes a neural information retrieval model and emphasizes the results obtained with the associated relevance back-propagation algorithm in three different environments: manual ad hoc, automatic ad hoc and mixed ad hoc strategy (automatic plus manual ad hoc).  相似文献   

2.
陈洁 《情报探索》2020,(2):114-119
[目的/意义]旨在为信息检索相关性研究提供参考。[方法/过程]以CNKI为数据源,采用定性方法,从信息检索的历史脉络和研究学派进行梳理总结,分析信息检索的影响因素和发展趋势。[结果/结论]信息检索相关性是用户、系统的相关性的综合体,任何一方都不能脱离。相关性应该是以用户为关键,系统为基础,研究用户与检索系统的交互、认知以及真实需求的描述与反馈。随着信息检索相关性研究的深入,系统观与用户观将会相互交融,检索技术与用户需求将会协调统一,共同推进检索相关性的发展。  相似文献   

3.
相关性基础理论及其在检索建模中的作用研究   总被引:1,自引:0,他引:1  
本文是对信息检索的一次理论研究。在总结了已有相关性研究的基础上,对信息检索模型之中的相关性因素做了系统梳理,认为现有信息检索模型中的系统相关性因素体现得不十分明显,同时用户相关性的因素没有很好地纳入系统相关性和系统设计研究之中。与相关性有关的概念是相似,它存在于文本空间之中。与相关相比,相似具有更好的数学特征。查询是相关判断的过程载体,它同时也是信息检索研究的瓶颈。寻找更为恰当的相关性的隐喻则需要跳出文本的藩篱,从更为深入的模式相关切入,探索更为复杂的相关性因素。  相似文献   

4.
Analyzing actions to be supported by information and information retrieval (IR) systems is vital for understanding the needs of different types of information, search strategies and relevance assessments, in short, understanding IR. A necessary condition for this understanding is to link results from information seeking studies to the body of knowledge by IR studies. The actions to be focused on in this paper are tasks from the angle of problem solving. I will analyze certain features of work tasks and relate these features to types of information people are looking for and using in their tasks, patterning of search strategies for obtaining information and relevance assessments in choosing retrieved documents. The major claim is that these information activities are systematically connected to task complexity and structure of the problem at hand. The argumentation is based on both theoretical and empirical results from studies on information retrieval and seeking.  相似文献   

5.
We are interested in how ideas from document clustering can be used to improve the retrieval accuracy of ranked lists in interactive systems. In particular, we are interested in ways to evaluate the effectiveness of such systems to decide how they might best be constructed. In this study, we construct and evaluate systems that present the user with ranked lists and a visualization of inter-document similarities. We first carry out a user study to evaluate the clustering/ranked list combination on instance-oriented retrieval, the task of the TREC-6 Interactive Track. We find that although users generally prefer the combination, they are not able to use it to improve effectiveness. In the second half of this study, we develop and evaluate an approach that more directly combines the ranked list with information from inter-document similarities. Using the TREC collections and relevance judgments, we show that it is possible to realize substantial improvements in effectiveness by doing so, and that although users can use the combined information effectively, the system can provide hints that substantially improve on the user's solo effort. The resulting approach shares much in common with an interactive application of incremental relevance feedback. Throughout this study, we illustrate our work using two prototype systems constructed for these evaluations. The first, AspInQuery, is a classic information retrieval system augmented with a specialized tool for recording information about instances of relevance. The other system, Lighthouse, is a Web-based application that combines a ranked list with a portrayal of inter-document similarity. Lighthouse can work with collections such as TREC, as well as the results of Web search engines.  相似文献   

6.
Term classifications and thesauri can be used for many purposes in automatic information retrieval. Normally a thesaurus is generated manually by subject experts: alternatively, the associations between the terms can be obtained automatically by using the occurrence characteristics of the terms across the documents of a collection. A third possibility consists in taking into account user relevance assessments of certain documents with respect to certain queries in order to build term classes designed to retrieve the relevant documents and simultaneously to reject the nonrelevant documents. This last strategy, known as pseudoclassification, produces a user-dependent term classification.A number of pseudoclassification studies are summarized in the present report, and conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance assessments.  相似文献   

7.
In this paper we present a new algorithm for relevance feedback (RF) in information retrieval. Unlike conventional RF algorithms which use the top ranked documents for feedback, our proposed algorithm is a kind of active feedback algorithm which actively chooses documents for the user to judge. The objectives are (a) to increase the number of judged relevant documents and (b) to increase the diversity of judged documents during the RF process. The algorithm uses document-contexts by splitting the retrieval list into sub-lists according to the query term patterns that exist in the top ranked documents. Query term patterns include a single query term, a pair of query terms that occur in a phrase and query terms that occur in proximity. The algorithm is an iterative algorithm which takes one document for feedback in each of the iterations. We experiment with the algorithm using the TREC-6, -7, -8, -2005 and GOV2 data collections and we simulate user feedback using the TREC relevance judgements. From the experimental results, we show that our proposed split-list algorithm is better than the conventional RF algorithm and that our algorithm is more reliable than a similar algorithm using maximal marginal relevance.  相似文献   

8.
陆小辉 《现代情报》2006,26(2):125-127
在传统的信息检索中。我们常用检全率、检准率来评价信息检索系统性能和检索效果。随着信息网络化的迅速发展。信息总量的息剧增加。信息交流速度的日益加快,相关性这一信息检索申的关键性概念日益受到人们的重视。本文分析探讨了信息检索中相关性的概念、构成及评价指标,阐述了提高信息检索相关性的措施。  相似文献   

9.
The primary aim of this study is to suggest a formalized definition (“explication”) of “relevance relationship” between texts, including the explication of the concept of “degree of relevance”. The concept of information language (IL), its vocabulry and syntax and the notion of the “semantic power” of an information language are defined. The concept of ideally functioning information retrieval systems (IRS) is suggested and different kinds of deviations from such IRS are considered.  相似文献   

10.
This paper presents a laboratory based evaluation study of cross-language information retrieval technologies, utilizing partially parallel test collections, NTCIR-2 (used together with NTCIR-1), where Japanese–English parallel document collections, parallel topic sets and their relevance judgments are available. These enable us to observe and compare monolingual retrieval processes in two languages as well as retrieval across languages. Our experiments focused on (1) the Rosetta stone question (whether a partially parallel collection helps in cross-language information access or not?) and (2) two aspects of retrieval difficulties namely “collection discrepancy” and “query discrepancy”. Japanese and English monolingual retrieval systems are combined by dictionary based query translation modules so that a symmetrical bilingual evaluation environment is implemented.  相似文献   

11.
陆小辉 《科技广场》2005,21(8):75-77
在传统的信息检索中,我们常用检全率、检准率来评价信息检索系统性能和检索效果.随着信息总量的急剧增加,信息载体形式的不断变化,信息交流速度的日益加快,相关性这一信息检索中的关键性概念日益受到人们的重视.本文分析探讨了信息检索中相关性的概念、构成及评价指标,阐述了提高信息检索相关性的措施。  相似文献   

12.
This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55–60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.  相似文献   

13.
基于关联理论的信息检索相关性研究——信息生产、标引   总被引:1,自引:0,他引:1  
文摘:在Saracevic以及Harter研究的基础上,提出了将语言学中的关联理论作为相关性研究的理论基础,并利用关联理论具体阐释了信息检索交互模型中的信息生产以及信息标引两项工作。  相似文献   

14.
基于关联理论的信息检索相关性研究   总被引:1,自引:0,他引:1  
利用关联理论的信息处理模型阐释了信息检索交互模型中的相关性评估模块,认为采用关联理论的信息处理模型阐释信息检索的相关性判断过程是可行的。  相似文献   

15.
The varying uses of the term “relevance” are considered by analyzing retrieval-related activities into three different processes (Formulation, Retrieval, and Utilization) and four sorts of entities (Enquiry, Attribute, Response, and Benefit).Three quite different sorts of “relevance” are identified (Responsiveness, Pertinence, and Beneficiality). It is argued that relevance in the sense of utility cannot properly be used to evaluate the performance of retrieval processes as retrieval processes. Any use of utility implies that the utilization is also being considered.  相似文献   

16.
This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.  相似文献   

17.
18.
Focusing on the context of XML retrieval, in this paper we propose a general methodology for managing structured queries (involving both content and structure) within any given structured probabilistic information retrieval system which is able to compute posterior probabilities of relevance for structural components given a non-structured query (involving only query terms but not structural restrictions). We have tested our proposal using two specific information retrieval systems (Garnata and PF/Tijah), and the structured document collections from the last six editions of the INitiative for the Evaluation of XML Retrieval (INEX).  相似文献   

19.
This paper addresses the problem of how to rank retrieval systems without the need for human relevance judgments, which are very resource intensive to obtain. Using TREC 3, 6, 7 and 8 data, it is shown how the overlap structure between the search results of multiple systems can be used to infer relative performance differences. In particular, the overlap structures for random groupings of five systems are computed, so that each system is selected an equal number of times. It is shown that the average percentage of a system’s documents that are only found by it and no other systems is strongly and negatively correlated with its retrieval performance effectiveness, such as its mean average precision or precision at 1000. The presented method uses the degree of consensus or agreement a retrieval system can generate to infer its quality. This paper also addresses the question of how many documents in a ranked list need to be examined to be able to rank the systems. It is shown that the overlap structure of the top 50 documents can be used to rank the systems, often producing the best results. The presented method significantly improves upon previous attempts to rank retrieval systems without the need for human relevance judgments. This “structure of overlap” method can be of value to communities that need to identify the best experts or rank them, but do not have the resources to evaluate the experts’ recommendations, since it does not require knowledge about the domain being searched or the information being requested.  相似文献   

20.
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号