首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
基于主题图的英汉跨语言检索模型构建   总被引:4,自引:3,他引:1  
针对现有跨语言检索模型普遍存在的翻译准确性差、效率低、成本高等不足,在深入分析主题图技术在揭示语词概念之间的语义关系和多语言支持等方面的优越性能的基础上,提出一个基于主题图的英汉跨语言检索模型,该模型采用索引翻译的策略来实现跨语言检索。该模型的突出特点是能够在提高翻译准确性的同时,有效降低翻译成本。此外,实现起来也比较简单。  相似文献   

2.
面对日益膨胀的多语种信息资源,跨语言信息检索已成为实现全球知识存取和共享的关键技术手段。构建一个实用型的跨语言检索查询翻译接口,可方便地嵌入任意的信息检索平台,扩展现有信息检索平台的多语言信息处理能力。该查询翻译接口采用基于最长短语、查询分类和概率词典等多种翻译消歧策略,并从查询翻译的准确性和接口的运行效率两个角度对构建的查询翻译接口进行评测,实验结果验证所采用方法具有可行性。  相似文献   

3.
针对合作数字参考咨询系统知识库建设中存在的异构、多语种、可扩展和知识管理等问题,在深入分析主题图和OAI技术的基础上,提出一种新的基于主题图的合作数字参考咨询系统的知识库建设方法.该方法通过OAI进行元数据映射和收割实现互操作,解决知识库的异构问题;通过主题图对知识库进行组织和管理,实现跨语言检索和知识库的及时更新,解决知识库的多语种、可扩展和知识管理等问题.  相似文献   

4.
基于情景模型的数字图书馆个性化服务研究   总被引:5,自引:0,他引:5  
由于web环境下用户的个性化需求具有易变性、动态性和情景敏感性等特征,个性化服务的提供越来越需要情景模型的支持.目前,基于情景模型的数字图书馆个性化服务主要通过个性化检索和个性化推荐两种方式实现.将情景模型运用于数字图书馆个性化服务,还需要注意和解决用户情景捕捉的准确性、服务提供的适用性、用户安全与隐私、用户模型与情景模型的结合以及数字图书馆各种资源的整合与集成等问题.图3.  相似文献   

5.
许多研究已经探讨了跨语言和多语言信息检索问题,并提出了多种实现方法,特别是针对查询的翻译.但是大多数的方法都将跨语言检索问题看成是两个分开的步骤查询的翻译和单语检索.而对于多语言信息检索,则另外再加上一个结果合成的步骤.在本文中,我们提出一种一体化的检索方法,即将查询的翻译看成是整个检索过程的一部分.使用这种一体化的方法能充分将翻译和检索中的不确定性结合起来,从而达到更好的整体优化,也能将单语言信息检索的方法用于跨语言及多语言信息检索.  相似文献   

6.
如何提高多语言信息服务质量已成为数字图书馆等科技信息服务领域的重要研究问题。文章首先介绍了国内外多语言信息服务相关研究,然后具体从跨语言信息检索和机器翻译两个方面介绍了国家科技文献中心多语言信息服务研究成果在国家科技文献在线服务系统中的应用。将跨语言信息检索功能和文摘翻译服务功能引入数字图书馆在线查询系统,在国内数字图书馆信息服务领域尚属探索性尝试,可以为进一步提高数字图书馆多语言信息服务质量提供经验。  相似文献   

7.
网络上分布着大量异构系统,如何通过网络为信息用户提供信息资源的有效访问,提高检索效率是数字图书馆建设的一个重要指标。以传统数字图书馆系统面临的问题为基础,探讨语义Web对数字图书馆的支持,在整合社会网络和以语义标注的多媒体资源的基础上,构建一种基于语义Web的数字图书馆模型。本体的引入可以增强语义,分面检索的应用则加强了模型系统的检索效率。仿真实验表明模型是有效的。  相似文献   

8.
在现有数字图书馆信息检索系统的基础上,针对检索结果的查准率和查全率偏低等问题,将智能交互式检索技术与CLIR技术相结合,设计基于跨语言交互式检索模型,并将其引入到数字图书馆系统进行应用。  相似文献   

9.
[目的/意义] 要实现"一带一路"多语种共享型数据库资源的有效利用,必须解决跨语言检索问题,基于已建"一带一路"数据库检索功能调查结果,分析"一带一路"多语种共享型数据库检索功能需求,以调研跨语言检索平台为视角,为"一带一路"多语种共享型数据库的跨语言检索功能设计与开发提供参考。[方法/过程] 采用文献调研法和网络调研法,选取11个国内外典型的跨语言检索平台,从跨语言检索方法、跨语言翻译实现方法、检索功能设置、检索结果呈现、界面与检索支持语种6个方面进行分析,总结其实现方法。[结果/结论] 为"一带一路"多语种共享型数据库的跨语言检索功能设计与开发提出策略:应采用基于神经网络机器翻译的提问式-文献翻译方法,实现多种检索功能,应用可视化技术呈现检索结果,提供多语言检索界面和资源。  相似文献   

10.
面对基于双语词典的跨语言检索查询翻译方法中固有的一对多等翻译模糊问题,已有研究成果存在对于非组合型复合词无法进行准确翻译、双语词典和其他翻译资源联合使用引入较大计算开销等弊端。为建立英汉双向跨语言检索实用性系统,在现有的一部包含若干科技词汇和短语的双语科技词典的基础上,着重研究如何引入平行语料来改进已有的双语词典问题。目标是生成一部基于句对齐平行语料的科技类双语概率词典,为跨语言检索查询翻译消歧提供实时性支持。  相似文献   

11.
In this paper, we study different applications of cross-language latent topic models trained on comparable corpora. The first focus lies on the task of cross-language information retrieval (CLIR). The Bilingual Latent Dirichlet allocation model (BiLDA) allows us to create an interlingual, language-independent representation of both queries and documents. We construct several BiLDA-based document models for CLIR, where no additional translation resources are used. The second focus lies on the methods for extracting translation candidates and semantically related words using only per-topic word distributions of the cross-language latent topic model. As the main contribution, we combine the two former steps, blending the evidences from the per-document topic distributions and the per-topic word distributions of the topic model with the knowledge from the extracted lexicon. We design and evaluate the novel evidence-rich statistical model for CLIR, and prove that such a model, which combines various (only internal) evidences, obtains the best scores for experiments performed on the standard test collections of the CLEF 2001–2003 campaigns. We confirm these findings in an alternative evaluation, where we automatically generate queries and perform the known-item search on a test subset of Wikipedia articles. The main importance of this work lies in the fact that we train translation resources from comparable document-aligned corpora and provide novel CLIR statistical models that exhaustively exploit as many cross-lingual clues as possible in the quest for better CLIR results, without use of any additional external resources such as parallel corpora or machine-readable dictionaries.  相似文献   

12.
综述命名实体识别与翻译研究现状,提出基于信息抽取的命名实体识别与翻译方法,以及对该方法进行一系列集成优化处理,并实现了基于命名实体识别与翻译的跨语言信息检索实验。实验结果显示出命名实体识别与翻译在跨语言信息检索中的重要性,并证明了所提出的翻译加权和网络挖掘未登录命名实体方法的应用能显著提高跨语言信息检索的性能。  相似文献   

13.
英汉交互式跨语言检索系统设计与实现   总被引:1,自引:0,他引:1  
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。  相似文献   

14.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

15.
分析跨语言信息检索的基本模式和翻译消歧关键技术,采用基于词语对共现率和词语间距加权计算的方法,对查询式翻译进行消歧优化,在此基础上构建跨语言商品信息检索系统并应用于图书商品搜索,实验结果证明翻译质量和检索效果得到提高。  相似文献   

16.
邱悦 《图书情报工作》2006,50(10):82-86
认为网络语言和用户语言的多样化使跨语言信息检索成为一个重要的研究领域,该领域所采用的技术主要包括基于机器翻译的方法、基于机读双语词典的方法、基于主题词表的方法以及基于平行语料库的方法。跨语言信息检索的实现除以技术为基础外,还需要查询扩展技术的辅助。  相似文献   

17.
Prior-art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness.  相似文献   

18.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

19.
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号