首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

2.
As an effective technique for improving retrieval effectiveness, relevance feedback (RF) has been widely studied in both monolingual and translingual information retrieval (TLIR). The studies of RF in TLIR have been focused on query expansion (QE), in which queries are reformulated before and/or after they are translated. However, RF in TLIR actually not only can help select better query terms, but also can enhance query translation by adjusting translation probabilities and even resolving some out-of-vocabulary terms. In this paper, we propose a novel relevance feedback method called translation enhancement (TE), which uses the extracted translation relationships from relevant documents to revise the translation probabilities of query terms and to identify extra available translation alternatives so that the translated queries are more tuned to the current search. We studied TE using pseudo-relevance feedback (PRF) and interactive relevance feedback (IRF). Our results show that TE can significantly improve TLIR with both types of relevance feedback methods, and that the improvement is comparable to that of query expansion. More importantly, the effects of translation enhancement and query expansion are complementary. Their integration can produce further improvement, and makes TLIR more robust for a variety of queries.  相似文献   

3.
The quality of feedback documents is crucial to the effectiveness of query expansion (QE) in ad hoc retrieval. Recently, machine learning methods have been adopted to tackle this issue by training classifiers from feedback documents. However, the lack of proper training data has prevented these methods from selecting good feedback documents. In this paper, we propose a new method, called AdapCOT, which applies co-training in an adaptive manner to select feedback documents for boosting QE’s effectiveness. Co-training is an effective technique for classification over limited training data, which is particularly suitable for selecting feedback documents. The proposed AdapCOT method makes use of a small set of training documents, and labels the feedback documents according to their quality through an iterative process. Two exclusive sets of term-based features are selected to train the classifiers. Finally, QE is performed on the labeled positive documents. Our extensive experiments show that the proposed method improves QE’s effectiveness, and outperforms strong baselines on various standard TREC collections.  相似文献   

4.
With the ever increasing size of the web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. Query Expansion (QE) plays a crucial role in improving searches on the Internet. Here, the user’s initial query is reformulated by adding additional meaningful terms with similar significance. QE – as part of information retrieval (IR) – has long attracted researchers’ attention. It has become very influential in the field of personalized social document, question answering, cross-language IR, information filtering and multimedia IR. Research in QE has gained further prominence because of IR dedicated conferences such as TREC (Text Information Retrieval Conference) and CLEF (Conference and Labs of the Evaluation Forum). This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications – bringing out similarities and differences.  相似文献   

5.
We present new methods of query expansion using terms that form lexical cohesive links between the contexts of distinct query terms in documents (i.e., words surrounding the query terms in text). The link-forming terms (link-terms) and short snippets of text surrounding them are evaluated in both interactive and automatic query expansion (QE). We explore the effectiveness of snippets in providing context in interactive query expansion, compare query expansion from snippets vs. whole documents, and query expansion following snippet selection vs. full document relevance judgements. The evaluation, conducted on the HARD track data of TREC 2005, suggests that there are considerable advantages in using link-terms and their surrounding short text snippets in QE compared to terms selected from full-texts of documents.  相似文献   

6.
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.  相似文献   

7.
The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.  相似文献   

8.
Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on the same target corpus as the final retrieval, in the past, external expansion techniques have sometimes been applied to obtain a high-quality pseudo-relevant feedback set using the external corpus. However, such external expansion approaches have only been studied for sparse (BoW) retrieval methods, and its effectiveness for recent dense retrieval methods remains under-investigated. Indeed, dense retrieval approaches such as ANCE and ColBERT, which conduct similarity search based on encoded contextualised query and document embeddings, are of increasing importance. Moreover, pseudo-relevance feedback mechanisms have been proposed to further enhance dense retrieval effectiveness. In particular, in this work, we examine the application of dense external expansion to improve zero-shot retrieval effectiveness, i.e. evaluation on corpora without further training. Zero-shot retrieval experiments with six datasets, including two TREC datasets and four BEIR datasets, when applying the MSMARCO passage collection as external corpus, indicate that obtaining external feedback documents using ColBERT can significantly improve NDCG@10 for the sparse retrieval (by upto 28%) and the dense retrieval (by upto 12%). In addition, using ANCE on the external corpus brings upto 30% NDCG@10 improvements for the sparse retrieval and upto 29% for the dense retrieval.  相似文献   

9.
10.
杨韦洁  高珑  苏静 《现代情报》2014,34(7):78-82,87
针对传统数字图书馆中基于关键字的P2P查询扩展存在对用户检索词语义信息解释不足的缺陷,本文提出一种P2P环境下基于语义的节点查询扩展方法,通过把关键字关联表和本体相结合,实现了一种个性化查询扩展方法,同时利用这种扩展方法实现P2P中基于兴趣网络的搜索,能够较大幅度提升检索效率。  相似文献   

11.
This is a thorough analysis of two techniques applied to Geographic Information Retrieval (GIR). Previous studies have researched the application of query expansion to improve the selection process of information retrieval systems. This paper emphasizes the effectiveness of the filtering of relevant documents applied to a GIR system, instead of query expansion. Based on the CLEF (Cross Language Evaluation Forum) framework available, several experiments have been run. Some based on query expansion, some on the filtering of relevant documents. The results show that filtering works better in a GIR environment, because relevant documents are not reordered in the final list.  相似文献   

12.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities.  相似文献   

13.
Among the problems associated with modern information retrieval systems is the lack of any systematic approach to the design of query language interfaces. In this paper we attempt to show how a relationally organised data base is well suited to bibliographic data management, and how, given such a relational organisation it is possible to construct an interface which separates the query language from the physical representation of the data base. It is also shown how such a query language organisation may be usefully interfaced to existing retrieval systems. Finally a query language for retrieval applications is proposed.  相似文献   

14.
文章采用了查询扩展的一些不同规则及方法,设计出一种扩展功能可选取的文本检索系统。在该检索系统中,用户可以根据自己对所要找的信息想进行哪方面的扩展检索而选择对应的扩展功能,这样可以更好、更灵活的根据用户的需求进行检索,避免了大量无关信息的返回,从而更好的提高了检索的性能。  相似文献   

15.
基于两次剪枝的完全加权关联规则挖掘算法   总被引:1,自引:0,他引:1  
完全加权关联规则挖掘在信息检索查询扩展领域中有着很高的应用价值.在深入研究现有算法的基础上,提出基于两次剪枝的完全加权关联规则的挖掘算法,给出与其相关的定理,进行相应的实验.该算法采用两次剪枝策略,候选项集数量和挖掘时间明显减少,挖掘效率得到提高.实验结果表明了该算法的有效性,与现有算法比较,挖掘效率确实得到改善.  相似文献   

16.
We compare a user-defined passage feedback (pf) system to a document feedback (df) system. Df employed the adaptive linear model for retrieval, while pf used weighted query expansion based on positive and negative feedback. Twenty-four searchers performed the same six tasks in varying search and system-order per TREC-8 guidelines. We hypothesized that pf, which featured interactive query expansion, would outperform df, which relied on automatic query expansion. Initial analysis appeared to reject this hypothesis, as df showed slightly higher overall performance than pf. However, analysis by system-order groups indicates only the first pf use had lower performance. These data suggest that pf was more difficult to learn than df, though the second pf use yielded competitive performance. If performance of pf is indeed affected by learning, an improved pf system with usability enhancements may prove to be an effective mechanism for interactive information retrieval.  相似文献   

17.
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency.  相似文献   

18.
Many of the approaches to image retrieval on the Web have their basis in text retrieval. However, when searchers are asked to describe their image needs, the resulting query is often short and potentially ambiguous. The solution we propose is to perform automatic query expansion using Wikipedia as the source knowledge base, resulting in a diversification of the search results. The outcome is a broad range of images that represent the various possible interpretations of the query. In order to assist the searcher in finding images that match their specific intentions for the query, we have developed an image organization method that uses both the conceptual information associated with each image, and the visual features extracted from the images. This, coupled with a hierarchical organization of the concepts, provides an interactive interface that takes advantage of the searchers’ abilities to recognize relevant concepts, filter and focus the search results based on these concepts, and visually identify relevant images while navigating within the image space. In this paper, we outline the key features of our image retrieval system (CIDER), and present the results of a preliminary user evaluation. The results of this study illustrate the potential benefits that CIDER can provide for searchers conducting image retrieval tasks.  相似文献   

19.
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.  相似文献   

20.
[目的/意义]旨在探讨解决原英日文献检索分析中存在的不足,实现以中文为检索语言的中、英、日文献资源的统一检索。[方法/过程]介绍了一种通过机器翻译技术实现以中文为检索语言的中日英文献资源的统一检索系统。[结果/结论]该系统基于MASH-UP技术将检索结果整合进行统一的内容分析,帮助用户跨越语言的障碍,方便地查询和使用外文文献。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号