首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 131 毫秒
1.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

2.
所谓数字人文取径,是将数字资源或数字工具的运用导入到一个人文学者完整的研究历程中,从问题意识开始,包含搜集与取得材料、整理与组织材料,到分析与观察,研究者可在此基础上进行诠释,进一步产出研究成果。本文便是从这样的方法论出发,陈述台湾大学数位人文研究中心在淡新档案的重整与系统化工作上,以数字人文研究为系统导向,融入三个重要的系统概念:学科元素的深化、研究功能的延伸、研究意识的连结;采取多重脉络的关联结构,将档案组织的层级从"案"细致化到"件",逐件进行全文的人、时、地、物与客家相关词汇的标注,并在戴炎辉分类之外,针对客家研究的需要,建立客家事件主题架构。在这样的档案内容深化后,运用DocuSky的云端资料库模式,建设"《淡新档案》客家研究数位分析系统"(Danxin Archives System for Hakka Studies,DASH),提供系统使用者与材料间互动探索的多元功能,并以标准格式DocuXML让使用者完整汇出需要的文本材料及加值信息,以衔接个人的议题研究,启动研究者个人化的数字人文研究历程。  相似文献   

3.
As geospatial missions age, one of the challenges for the usability of data is the availability of relevant and updated metadata with sufficient documentation that can be used by future generations of users to gain knowledge from the original data. Given that remote sensing data undergo many intermediate processing steps, for example, an understanding of the exact algorithms employed and the quality of that data produced could be key considerations for these users. As interest in global climate data is increasing, documentation about older data, their origins, and their provenance are valuable to first-time users attempting to perform historical climate research or comparative analysis of global change. Incomplete or missing documentation could be what stands in the way of a new researcher attempting to use the data. Therefore, preservation of documentation and related metadata is sometimes just as critical as the preservation of the original observational data. The Goddard Earth Sciences–Data and Information Service Center (GES DISC), a NASA Earth science Distributed Active Archive Center (DAAC) that falls under the management structure of the Earth Science Data and Information System (ESDIS), is actively pursuing the preservation of all necessary artifacts needed by future users.

In this article, we will detail the data custodial planning and the data lifecycle process developed for content preservation, and our implementation of a Preservation System to safeguard documents and associated artifacts from legacy (older) missions, as well as detail lessons learned regarding access rights and confidentiality of information issues. We also elaborate on key points that made our preservation effort successful; the primary points being drafting of a governing baseline for historical data preservation from satellite missions and using the historical baseline as a guide to content filtering of what documents to preserve. The Preservation System currently archives documentation content for High Resolution Dynamics Limb Sounder (HIRDLS), Upper Atmosphere Research Satellite (UARS), Total Ozone Mapping Spectrometer (TOMS) mission data, and the 1960s era Nimbus mission. Documentation from other missions like the Tropical Rainfall Measuring Mission (TRMM), the Ozone Monitoring Instrument (OMI), and the Atmospheric Infra-Red Sounder (AIRS) are also slated to be added to this repository, as well as other mission datasets to be preserved at the GES DISC.  相似文献   

4.
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
Oren KurlandEmail:
  相似文献   

5.
朱祥  张云秋 《图书情报工作》2019,63(16):143-150
[目的/意义]对近年来知识融合相关研究进行梳理与评价,以期为今后相关研究提供参考。[方法/过程]首先解析知识融合的概念,然后对知识融合的框架、过程和方法进行梳理,继而总结知识融合的研究趋势,最后进行研究展望。[结果/结论]知识融合研究在大数据环境下呈现出新的研究特点,但还不能满足大数据环境的要求,未来应从构建分层多维立体的大数据知识融合框架、提高知识融合的效率、构建实时动态融合机制、开展大数据实证应用研究4个方面开展知识融合研究。  相似文献   

6.
Under the National Innovation System (NIS) framework, knowledge stock has been recognized as a key factor for enhancing national innovative capabilities. However, despite the importance of patents and papers for measuring knowledge, previous research has not fully utilized patent and paper databases, and has instead relied on research and development (R&D) data. Therefore, in this research, I introduce a way to utilize both types of useful data when measuring industrial knowledge stocks. As primary data sources, the United States Patent and Trademark Office (USPTO) Web site for patents and the science citation index (SCI) for papers are used. In the case of Korea, the amount of knowledge stock proxied by patents and papers is different from that proxied by R&D, which indicates in turn that using a single indicator such as R&D may be misleading. Although the result may vary depending on the selected nation, the proposed method will be useful for gauging knowledge stocks in a more complementary way.  相似文献   

7.
《国际大都市图书馆指标体系》共分为资源条件、服务效能、服务成果和影响贡献四大部分,在这些大框架下列出各项评价指标。以国内外数家大都市图书馆为例,以若干测评指标为标准,分析比较这些图书馆的异同。与《国际大都市图书馆指标体系》课题研究的准则一样,研究的目的不是要评价好坏,主要在于发现异同,找出原因,以求取长补短,并希望通过这种分析,为国内大都市图书馆的发展提供可用信息。  相似文献   

8.
ABSTRACT

Many patrons of libraries have one or more print disabilities that partially or totally impair their ability to use standard print. However, many devices and services are available to aid patrons who can not use print in the conventional manner. This paper provides an overview of typical assistive technology devices and services available to students with print disabilities. Barriers to access and solutions to these barriers are also discussed. In addition, original research results are presented from a survey of two-year college libraries of the University System of Georgia.  相似文献   

9.
图书情报工作在知识创新体系中的地位和作用   总被引:2,自引:0,他引:2  
汪琼  秦铁辉 《图书情报工作》2005,49(3):10-12,26
创新是经济增长的源泉,是国家发展的动力。国家知识创新体系的建设关系到国家竞争能力的强弱。知识创新体系包括知识创新部门、知识传播部门和知识应用部门等三大部门。图书情报机构作为知识传播部门的一员,在知识创新体系中具有重要作用--能够促进知识创新成果的产生,能够使创新成果迅速转化为生产力,能够加速创新体系各系统之间的互动,有利于创新人才的培养。  相似文献   

10.
[目的/意义]人类社会在从信息社会向后信息时代过渡的进程中,正在经历从"信息"向"智能"的跨越。在大数据环境和人工智能技术发展的双重因素作用下,知识融合作为知识化和智能化过程中的关键环节,为完善知识服务、智慧服务、催生高级智能形态提供了重要的理论和技术支撑。开展基于全学科视角的知识融合调研,扩展知识融合研究视域,为全面阐释知识融合研究现状,构建统一的知识融合理论研究框架提供借鉴。[方法/过程]本文采用定量与定性结合的文献分析方法,分析不同学科视角下知识融合研究现状,归纳知识融合在不同学科中的主要研究内容和关注的问题、知识融合涉及到的因素、知识融合的应用场景等。[结果/结论]知识融合研究属于多交叉学科领域,概念边界模糊,研究领域分散,目前尚未形成统一的研究框架。本文通过文献调研,在充分总结既往知识融合研究成果的基础上,划分知识融合的研究取向,为知识融合研究提出合理建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号