首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 558 毫秒
1.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

2.
Vocabulary incompatibilities arise when the terms used to index a document collection are largely unknown, or at least not well-known to the users who eventually search the collection. No matter how comprehensive or well-structured the indexing vocabulary, it is of little use if it is not used effectively in query formulation. This paper demonstrates that techniques for mapping user queries into the controlled indexing vocabulary have the potential to radically improve document retrieval performance. We also show how the use of controlled indexing vocabulary can be employed to achieve performance gains for collection selection. Finally, we demonstrate the potential benefit of combining these two techniques in an interactive retrieval environment. Given a user query, our evaluation approach simulates the human user's choice of terms for query augmentation given a list of controlled vocabulary terms suggested by a system. This strategy lets us evaluate interactive strategies without the need for human subjects.  相似文献   

3.
The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase-based document ranking method. A combined syntactico-statistical method was used for the selection of phrases for query expansion. Several statistical measures of phrase selection were evaluated. Experiments were also conducted studying the effectiveness of noun phrases in document ranking. One of the major problems in phrase-based document retrieval is weighting of overlapping and non-contiguous word sequences in documents. The paper presents a new method of phrase weighting, which addressed this problem, and its evaluation on the TREC dataset.  相似文献   

4.
问答式信息检索是新一代搜索引擎,它接收自然语言描述的问题,在文档集合中搜索并返回问题的精确答案.问答式信息检索中,检索模块性能的提高将直接影响问题回答系统的整体性能.本文研究系统中的查询优化技术,包括两种策略:基于模式知识库的查询优化;挖掘Web语义蕴含信息,构建查询扩展资源.本文利用TREC提供的问题集与答案集(TREC8-TREC13)做实验来测试查询优化方法的性能,实验结果表明,相对于传统的查询生成,本文采用的查询优化技术在检索精度上取得了提高,t-test结果证明,系统性能提高统计显著.  相似文献   

5.
Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation.  相似文献   

6.
信息检索模型与逻辑理论   总被引:6,自引:0,他引:6  
杨建林 《情报学报》2000,19(5):514-518
本文从另外一个角度讨论信息检索模型。基于逻辑理论,本文定义了一种查询与知识集的等价关系,从逻辑的角度描述了查询与信息检索系统之间的关系、查询与作为检索结果的文献集之间的关系。  相似文献   

7.
8.
信息检索的逻辑模型   总被引:9,自引:1,他引:8  
杨建林 《情报学报》2000,19(4):338-341
本文建立了一个基于逻辑理论的信息检索模型。首先给出信息系统逻辑化的方法,然后借助逻辑程序的不动点语义,定义了查询与文献的相关程度。  相似文献   

9.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

10.
In this paper, which treats Swedish full text retrieval, the problem of morphological variation of query terms in the document database is studied. The Swedish CLEF 2003 test collection was used, and the effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Four of the seven tested combinations involved indexing strategies that used normalization, a form of conflation. All of these four combinations employed compound splitting, both during indexing and at query phase. SWETWOL, a morphological analyzer for the Swedish language, was used for normalization and compound splitting. A fifth combination used stemming, while a sixth attempted to group related terms by right hand truncation of query terms. The truncation was performed by a search expert. These six combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. Both the truncation combination, the four combinations based on normalization and the stemming combination outperformed the baseline. Truncation had the best performance. The main conclusion of the paper is that truncation, normalization and stemming enhanced retrieval effectiveness in comparison to the baseline. Further, normalization and stemming were not far below truncation.  相似文献   

11.
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text   总被引:1,自引:1,他引:0  
A known-item search is a particular information retrieval task in which the system is asked to find a single target document in a large document set. The TREC-5 confusion track used a set of 49 known-item tasks to study the impact of data corruption on retrieval system performance. Two corrupted versions of a 55,600 document corpus whose true content was known were created by applying OCR techniques to page images. The first version of the corpus used the page images as scanned, resulting in an estimated character error rate of approximately 5%. The second version used page images that had been down-sampled, resulting in an estimated character error rate of approximately 20%. The true text and each of the corrupted versions were then searched using the same set of 49 questions. In general, retrieval methods that attempted a probabilistic reconstruction of the original clean text fared better than methods that simply accepted corrupted versions of the query text.  相似文献   

12.
In Information Retrieval, since it is hard to identify users’ information needs, many approaches have been tried to solve this problem by expanding initial queries and reweighting the terms in the expanded queries using users’ relevance judgments. Although relevance feedback is most effective when relevance information about retrieved documents is provided by users, it is not always available. Another solution is to use correlated terms for query expansion. The main problem with this approach is how to construct the term-term correlations that can be used effectively to improve retrieval performance. In this study, we try to construct query concepts that denote users’ information needs from a document space, rather than to reformulate initial queries using the term correlations and/or users’ relevance feedback. To form query concepts, we extract features from each document, and then cluster the features into primitive concepts that are then used to form query concepts. Experiments are performed on the Associated Press (AP) dataset taken from the TREC collection. The experimental evaluation shows that our proposed framework called QCM (Query Concept Method) outperforms baseline probabilistic retrieval model on TREC retrieval.  相似文献   

13.
Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.
Chengxiang ZhaiEmail:
  相似文献   

14.
As the volume and variety of information sources continues to grow, there is increasing difficulty with respect to obtaining information that accurately matches user information needs. A number of factors affect information retrieval effectiveness (the accuracy of matching user information needs against the retrieved information). First, users often do not present search queries in the form that optimally represents their information need. Second, the measure of a document’s relevance is often highly subjective between different users. Third, information sources might contain heterogeneous documents, in multiple formats and the representation of documents is not unified. This paper discusses an approach for improvement of information retrieval effectiveness from document databases. It is proposed that retrieval effectiveness can be improved by applying computational intelligence techniques for modelling information needs, through interactive reinforcement learning. The method combines qualitative (subjective) user relevance feedback with quantitative (algorithmic) measures of the relevance of retrieved documents. An information retrieval is developed whose retrieval effectiveness is evaluated using traditional precision and recall.  相似文献   

15.
在海量信息中检索时,与用户查询相关的信息常常被漏掉,而与查询无关的信息———信息垃圾,却大量地出现在检索结果中。改进文本信息检索系统的质量,提高检索效能,已成为亟待解决的问题。本文针对能够影响检索效力的一个易被忽略的因素———修饰语,研究其在文本信息检索中的作用。为此,构建了修正的向量空间模型(Modified Vector Space Model,MVSM),并以英文文本进行试验,进而说明修饰语的作用。  相似文献   

16.
On Collection Size and Retrieval Effectiveness   总被引:3,自引:0,他引:3  
The relationship between collection size and retrieval effectiveness is particularly important in the context of Web search. We investigate it first analytically and then experimentally, using samples and subsets of test collections. Different retrieval systems vary in how the score assigned to an individual document in a sample collection relates to the score it receives in the full collection; we identify four cases.We apply signal detection (SD) theory to retrieval from samples, taking into account the four cases and using a variety of shapes for relevant and irrelevant distributions. We note that the SD model subsumes several earlier hypotheses about the causes of the decreased precision in samples. We also discuss other models which contribute to an understanding of the phenomenon, particularly relating to the effects of discreteness. Different models provide complementary insights.Extensive use is made of test data, some from official submissions to the TREC-6 VLC track and some new, to illustrate the effects and test hypotheses. We empirically confirm predictions, based on SD theory, that P@n should decline when moving to a sample collection and that average precision and R-precision should remain constant. SD theory suggests the use of recall-fallout plots as operating characteristic (OC) curves. We plot OC curves of this type for a real retrieval system and query set and show that curves for sample collections are similar but not identical to the curve for the full collection.  相似文献   

17.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

18.
Content-only queries in hierarchically structured documents should retrieve the most specific document nodes which are exhaustive to the information need. For this problem, we investigate two methods of augmentation, which both yield high retrieval quality. As retrieval effectiveness, we consider the ratio of retrieval quality and response time; thus, fast approximations to the 'correct' retrieval result may yield higher effectiveness. We present a classification scheme for algorithms addressing this issue, and adopt known algorithms from standard document retrieval for XML retrieval. As a new strategy, we propose incremental-interruptible retrieval, which allows for instant presentation of the top ranking documents. We develop a new algorithm implementing this strategy and evaluate the different methods with the INEX collection.  相似文献   

19.
In this paper, we propose a new term dependence model for information retrieval, which is based on a theoretical framework using Markov random fields. We assume two types of dependencies of terms given in a query: (i) long-range dependencies that may appear for instance within a passage or a sentence in a target document, and (ii) short-range dependencies that may appear for instance within a compound word in a target document. Based on this assumption, our two-stage term dependence model captures both long-range and short-range term dependencies differently, when more than one compound word appear in a query. We also investigate how query structuring with term dependence can improve the performance of query expansion using a relevance model. The relevance model is constructed using the retrieval results of the structured query with term dependence to expand the query. We show that our term dependence model works well, particularly when using query structuring with compound words, through experiments using a 100-gigabyte test collection of web documents mostly written in Japanese. We also show that the performance of the relevance model can be significantly improved by using the structured query with our term dependence model.
Koji EguchiEmail:
  相似文献   

20.
调研UMLS构成和建设特点,重点研究UMLS在检索方面的应用实例,分析归纳UMLS在语义化、智能化检索方面的功能设计、实现方法与实际效果,以期为基于集成式知识组织系统的智能检索应用的场景功能设计、技术开发和实现,提供借鉴和参考。UMLS在智能检索中的应用主要包括:(1)扩展检索,主要有同义词扩展、等级结构扩展和词组切分扩展等方法;(2)语义检索,基于概念和概念之间的关系进行检索和结果内容表达;(3)问答式检索,包括问题分析、文献检索、语句提取、答案生成和语义聚类。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号