首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
As an effective technique for improving retrieval effectiveness, relevance feedback (RF) has been widely studied in both monolingual and translingual information retrieval (TLIR). The studies of RF in TLIR have been focused on query expansion (QE), in which queries are reformulated before and/or after they are translated. However, RF in TLIR actually not only can help select better query terms, but also can enhance query translation by adjusting translation probabilities and even resolving some out-of-vocabulary terms. In this paper, we propose a novel relevance feedback method called translation enhancement (TE), which uses the extracted translation relationships from relevant documents to revise the translation probabilities of query terms and to identify extra available translation alternatives so that the translated queries are more tuned to the current search. We studied TE using pseudo-relevance feedback (PRF) and interactive relevance feedback (IRF). Our results show that TE can significantly improve TLIR with both types of relevance feedback methods, and that the improvement is comparable to that of query expansion. More importantly, the effects of translation enhancement and query expansion are complementary. Their integration can produce further improvement, and makes TLIR more robust for a variety of queries.  相似文献   

2.
Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query’s additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our “controlled” query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field.  相似文献   

3.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

4.
In this era, the proliferating role of social media in our lives has popularized the posting of the short text. The short texts contain limited context with unique characteristics which makes them difficult to handle. Every day billions of short texts are produced in the form of tags, keywords, tweets, phone messages, messenger conversations social network posts, etc. The analysis of these short texts is imperative in the field of text mining and content analysis. The extraction of precise topics from large-scale short text documents is a critical and challenging task. The conventional approaches fail to obtain word co-occurrence patterns in topics due to the sparsity problem in short texts, such as text over the web, social media like Twitter, and news headlines. Therefore, in this paper, the sparsity problem is ameliorated by presenting a novel fuzzy topic modeling (FTM) approach for short text through fuzzy perspective. In this research, the local and global term frequencies are computed through a bag-of-words (BOW) model. To remove the negative impact of high dimensionality on the global term weighting, the principal component analysis is adopted; thereafter the fuzzy c-means algorithm is employed to retrieve the semantically relevant topics from the documents. The experiments are conducted over the three real-world short text datasets: the snippets dataset is in the category of small dataset whereas the other two datasets, Twitter and questions, are the bigger datasets. Experimental results show that the proposed approach discovered the topics more precisely and performed better as compared to other state-of-the-art baseline topic models such as GLTM, CSTM, LTM, LDA, Mix-gram, BTM, SATM, and DREx+LDA. The performance of FTM is also demonstrated in classification, clustering, topic coherence and execution time. FTM classification accuracy is 0.95, 0.94, 0.91, 0.89 and 0.87 on snippets dataset with 50, 75, 100, 125 and 200 number of topics. The classification accuracy of FTM on questions dataset is 0.73, 0.74, 0.70, 0.68 and 0.78 with 50, 75, 100, 125 and 200 number of topics. The classification accuracies of FTM on snippets and questions datasets are higher than state-of-the-art baseline topic models.  相似文献   

5.
The quality of feedback documents is crucial to the effectiveness of query expansion (QE) in ad hoc retrieval. Recently, machine learning methods have been adopted to tackle this issue by training classifiers from feedback documents. However, the lack of proper training data has prevented these methods from selecting good feedback documents. In this paper, we propose a new method, called AdapCOT, which applies co-training in an adaptive manner to select feedback documents for boosting QE’s effectiveness. Co-training is an effective technique for classification over limited training data, which is particularly suitable for selecting feedback documents. The proposed AdapCOT method makes use of a small set of training documents, and labels the feedback documents according to their quality through an iterative process. Two exclusive sets of term-based features are selected to train the classifiers. Finally, QE is performed on the labeled positive documents. Our extensive experiments show that the proposed method improves QE’s effectiveness, and outperforms strong baselines on various standard TREC collections.  相似文献   

6.
The importance of query performance prediction has been widely acknowledged in the literature, especially for query expansion, refinement, and interpolating different retrieval approaches. This paper proposes a novel semantics-based query performance prediction approach based on estimating semantic similarities between queries and documents. We introduce three post-retrieval predictors, namely (1) semantic distinction, (2) semantic query drift, and (3) semantic cohesion based on (1) the semantic similarity of a query to the top-ranked documents compared to the whole collection, (2) the estimation of non-query related aspects of the retrieved documents using semantic measures, and (3) the semantic cohesion of the retrieved documents. We assume that queries and documents are modeled as sets of entities from a knowledge graph, e.g., DBPedia concepts, instead of bags of words. With this assumption, semantic similarities between two texts are measured based on the relatedness between entities, which are learned from the contextual information represented in the knowledge graph. We empirically illustrate these predictors’ effectiveness, especially when term-based measures fail to quantify query performance prediction hypotheses correctly. We report our findings on the proposed predictors’ performance and their interpolation on three standard collections, namely ClueWeb09-B, ClueWeb12-B, and Robust04. We show that the proposed predictors are effective across different datasets in terms of Pearson and Kendall correlation coefficients between the predicted performance and the average precision measured by relevance judgments.  相似文献   

7.
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.  相似文献   

8.
Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents.  相似文献   

9.
Facet-based opinion retrieval from blogs   总被引:1,自引:0,他引:1  
The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback–Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.  相似文献   

10.
Interactive query expansion (IQE) (c.f. [Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Systems and Technology, 31, 121–187]) is a potentially useful technique to help searchers formulate improved query statements, and ultimately retrieve better search results. However, IQE is seldom used in operational settings. Two possible explanations for this are that IQE is generally not integrated into searchers’ established information-seeking behaviors (e.g., examining lists of documents), and it may not be offered at a time in the search when it is needed most (i.e., during the initial query formulation). These challenges can be addressed by coupling IQE more closely with familiar search activities, rather than as a separate functionality that searchers must learn. In this article we introduce and evaluate a variant of IQE known as Real-Time Query Expansion (RTQE). As a searcher enters their query in a text box at the interface, RTQE provides a list of suggested additional query terms, in effect offering query expansion options while the query is formulated. To investigate how the technique is used – and when it may be useful – we conducted a user study comparing three search interfaces: a baseline interface with no query expansion support; an interface that provides expansion options during query entry, and a third interface that provides options after queries have been submitted to a search system. The results show that offering RTQE leads to better quality initial queries, more engagement in the search, and an increase in the uptake of query expansion. However, the results also imply that care must be taken when implementing RTQE interactively. Our findings have broad implications for how IQE should be offered, and form part of our research on the development of techniques to support the increased use of query expansion.  相似文献   

11.
This study addresses the question of whether the way in which sets of query terms are identified has an impact on the effectiveness of users’ information seeking efforts. Query terms are text strings used as input to an information access system; they are products of a method or grammar that identifies a set of query terms. We conducted an experiment that compared the effectiveness of sets of query terms identified for a single book by three different methods. One had been previously prepared by a human indexer for a back-of-the-book index. The other two were identified by computer programs that used a combination of linguistic and statistical criteria to extract terms from full text. Effectiveness was measured by (1) whether selected query terms led participants to correct answers and (2) how long it took participants to obtain correct answers. Our results show that two sets of terms – the human terms and the set selected according to the linguistically more sophisticated criteria – were significantly more effective than the third set of terms. This single case demonstrates that query languages do have a measurable impact on the effectiveness of query term languages in the interactive information access process. The procedure described in this paper can be used to assess the effectiveness for information seekers of query terms identified by any query language.  相似文献   

12.
One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.  相似文献   

13.
This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.  相似文献   

14.
Many of the approaches to image retrieval on the Web have their basis in text retrieval. However, when searchers are asked to describe their image needs, the resulting query is often short and potentially ambiguous. The solution we propose is to perform automatic query expansion using Wikipedia as the source knowledge base, resulting in a diversification of the search results. The outcome is a broad range of images that represent the various possible interpretations of the query. In order to assist the searcher in finding images that match their specific intentions for the query, we have developed an image organization method that uses both the conceptual information associated with each image, and the visual features extracted from the images. This, coupled with a hierarchical organization of the concepts, provides an interactive interface that takes advantage of the searchers’ abilities to recognize relevant concepts, filter and focus the search results based on these concepts, and visually identify relevant images while navigating within the image space. In this paper, we outline the key features of our image retrieval system (CIDER), and present the results of a preliminary user evaluation. The results of this study illustrate the potential benefits that CIDER can provide for searchers conducting image retrieval tasks.  相似文献   

15.
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.  相似文献   

16.
This is a thorough analysis of two techniques applied to Geographic Information Retrieval (GIR). Previous studies have researched the application of query expansion to improve the selection process of information retrieval systems. This paper emphasizes the effectiveness of the filtering of relevant documents applied to a GIR system, instead of query expansion. Based on the CLEF (Cross Language Evaluation Forum) framework available, several experiments have been run. Some based on query expansion, some on the filtering of relevant documents. The results show that filtering works better in a GIR environment, because relevant documents are not reordered in the final list.  相似文献   

17.
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms.  相似文献   

18.
从实用的角度,对珀尔修斯数字图书馆设置的单词检索、内容查询和文献检索等显性检索工具,以及自动检索工具、交互式文本、注释、参照、进一步阅读和内容服务等隐性检索工具的功能和特色等进行评论分析,为读者阅读、学习和研究相关文献提供颇有价值的资料和经验。  相似文献   

19.
Multimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.  相似文献   

20.
In this paper we present a new algorithm for relevance feedback (RF) in information retrieval. Unlike conventional RF algorithms which use the top ranked documents for feedback, our proposed algorithm is a kind of active feedback algorithm which actively chooses documents for the user to judge. The objectives are (a) to increase the number of judged relevant documents and (b) to increase the diversity of judged documents during the RF process. The algorithm uses document-contexts by splitting the retrieval list into sub-lists according to the query term patterns that exist in the top ranked documents. Query term patterns include a single query term, a pair of query terms that occur in a phrase and query terms that occur in proximity. The algorithm is an iterative algorithm which takes one document for feedback in each of the iterations. We experiment with the algorithm using the TREC-6, -7, -8, -2005 and GOV2 data collections and we simulate user feedback using the TREC relevance judgements. From the experimental results, we show that our proposed split-list algorithm is better than the conventional RF algorithm and that our algorithm is more reliable than a similar algorithm using maximal marginal relevance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号