首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
面对日益膨胀的多语种信息资源,跨语言信息检索已成为实现全球知识存取和共享的关键技术手段。构建一个实用型的跨语言检索查询翻译接口,可方便地嵌入任意的信息检索平台,扩展现有信息检索平台的多语言信息处理能力。该查询翻译接口采用基于最长短语、查询分类和概率词典等多种翻译消歧策略,并从查询翻译的准确性和接口的运行效率两个角度对构建的查询翻译接口进行评测,实验结果验证所采用方法具有可行性。  相似文献   

2.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

3.
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.  相似文献   

4.
A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles. In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods. The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.  相似文献   

5.
The number of Web users whose first language is not English continues to grow, as does the amount of content provided in languages other than English. This poses new challenges for actors on the Web, such as in which language(s) content should be offered, how search tools should deal with mono- and multilingual content, and how users can make the best use of navigation and search options, suited to their individual linguistic skills. How should these challenges be dealt with? Technological approaches to non-English (or in general, cross-language) Web search have made large progress; however, translation remains a hard problem. This precludes a low-cost but high-quality blanket all-language coverage of the whole Web. In this paper, we propose a user-centric approach to answering questions of where to best concentrate efforts and investments. Drawing on linguistic research, we describe data on the availability of content and access to it in first and second languages across the Web. We then present three studies that investigated the impact of the availability (or not) of first-language content and access forms on user behaviour and attitudes. The results indicate that non-English languages are under-represented on the Web and that this is partly due to content-creation, link-setting and link-following behaviour. They also show that user satisfaction is influenced both by the cognitive effort of searching and the availability of alternative information in that language. These findings suggest that more cross-language tools are desirable. However, they also indicate that context (such as user groups’ domain expertise or site type) should be considered when tradeoffs between information quality and multilinguality need to be taken into account.  相似文献   

6.
要实现网络信息或数字图书馆信息的有效多语言获取,需充分考虑用户交互.通过用户实验,检验用户相关反馈机制在多语言信息获取中的作用,并分析用户行为特点.实验结果证明,查询扩展、翻译优化以及两者的结合均是有效的用户相关反馈方法.  相似文献   

7.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

8.
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a learning process. The contribution of the individual components, the effectiveness of compound handling, proper name matching and structuring of queries are analyzed. The results and the fault analysis have been valuable in the development process. Overall the results indicate that the process is robust and can be extended to other languages. The individual effects of the different components are in general positive. However, performance also depends on the topic set and the number of compounds and proper names in the topic, and to some extent on the source and target language. The dictionaries used affect the performance significantly.  相似文献   

9.
Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand, we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results. We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.  相似文献   

10.
面对基于双语词典的跨语言检索查询翻译方法中固有的一对多等翻译模糊问题,已有研究成果存在对于非组合型复合词无法进行准确翻译、双语词典和其他翻译资源联合使用引入较大计算开销等弊端。为建立英汉双向跨语言检索实用性系统,在现有的一部包含若干科技词汇和短语的双语科技词典的基础上,着重研究如何引入平行语料来改进已有的双语词典问题。目标是生成一部基于句对齐平行语料的科技类双语概率词典,为跨语言检索查询翻译消歧提供实时性支持。  相似文献   

11.
Prior-art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness.  相似文献   

12.
综述命名实体识别与翻译研究现状,提出基于信息抽取的命名实体识别与翻译方法,以及对该方法进行一系列集成优化处理,并实现了基于命名实体识别与翻译的跨语言信息检索实验。实验结果显示出命名实体识别与翻译在跨语言信息检索中的重要性,并证明了所提出的翻译加权和网络挖掘未登录命名实体方法的应用能显著提高跨语言信息检索的性能。  相似文献   

13.
张彦文 《图书情报工作》2014,58(14):139-147
认为多语言信息存取(MLIA)是数字图书馆面临的一个重要问题,而跨语言信息搜索(CLIR)则是MLIA的主要应用,CLIR中以用户为中心的研究相比以技术为中心的研究少但正日益受到重视。目前CLIR中的用户需求、用户行为、用户体验或用户满意度等的定性或定量研究已发展为交互式跨语言信息搜索即interactive CLIR(iCLIR)。从系统的角度对国内外主要的iCLIR研究进行对比,揭示其用户交互技术策略,分析其翻译消歧、查询优化等核心技术,预测未来iCLIR的发展趋势。  相似文献   

14.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

15.
邱悦 《图书情报工作》2006,50(10):82-86
认为网络语言和用户语言的多样化使跨语言信息检索成为一个重要的研究领域,该领域所采用的技术主要包括基于机器翻译的方法、基于机读双语词典的方法、基于主题词表的方法以及基于平行语料库的方法。跨语言信息检索的实现除以技术为基础外,还需要查询扩展技术的辅助。  相似文献   

16.
17.
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult.  相似文献   

18.
We describe the Eurospider component for Cross-Language Information Retrieval (CLIR) that has been employed for experiments at all three CLEF campaigns to date. The central aspect of our efforts is the use of combination approaches, effectively combining multiple language pairs, translation resources and translation methods into one multilingual retrieval system. We discuss the implications of building a system that allows flexible combination, give details of the various translation resources and methods, and investigate the impact of merging intermediate results generated by the individual steps. An analysis of the resulting combination system is given which also takes into account additional requirements when deploying the system as a component in an operational, commercial setting.  相似文献   

19.
Cross-language information retrieval (CLIR) has so far been studied with the assumption that some rich linguistic resources such as bilingual dictionaries or parallel corpora are available. But creation of such high quality resources is labor-intensive and they are not always at hand. In this paper we investigate the feasibility of using only comparable corpora for CLIR, without relying on other linguistic resources. Comparable corpora are text documents in different languages that cover similar topics and are often naturally attainable (e.g., news articles published in different languages at the same time period). We adapt an existing cross-lingual word association mining method and incorporate it into a language modeling approach to cross-language retrieval. We investigate different strategies for estimating the target query language models. Our evaluation results on the TREC Arabic–English cross-lingual data show that the proposed method is effective for the CLIR task, demonstrating that it is feasible to perform cross-lingual information retrieval with just comparable corpora.  相似文献   

20.
英汉交互式跨语言检索系统设计与实现   总被引:1,自引:0,他引:1  
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号