首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于同族专利获取双语语料的方法研究——以获取汉英双语语料为例
引用本文:霍翠婷,吴琳.基于同族专利获取双语语料的方法研究——以获取汉英双语语料为例[J].数字图书馆论坛,2009(11):67-71.
作者姓名:霍翠婷  吴琳
作者单位:中国科学技术信息研究所,北京,100038
基金项目:国家科技支撑计划项目"多语言信息服务环境关键技术研究与应用","科技文献信息服务系统应用示范"(2006BAH03B06)的研究成果之一 
摘    要:双语语料库在机器翻译、跨语言信息检索以及翻译词典编纂等自然语言处理领域有着越来越重要的用途。该研究利用同族专利文献信息作为双语语料的来源,探讨了基于同族专利获取双语语料的可行性,以获取汉英双语语料为实例提出了双语语料的获取流程,同时进行双语对译部分的对齐规则的研究,从而构建出科技领域的平行双语语料库。最后,还阐述了该方法的相关注意事项以及应用前景。

关 键 词:同族专利  双语语料

The Method of Bilingual Corpus Extraction Based on Patent Family-Case study for English-Chinese Bilingual Corpus
Huo Cuiting,Wu Lin.The Method of Bilingual Corpus Extraction Based on Patent Family-Case study for English-Chinese Bilingual Corpus[J].Digital Library Forum,2009(11):67-71.
Authors:Huo Cuiting  Wu Lin
Institution:( Institute of Scientific and Technical Information of China, Beijing, 100038 )
Abstract:Bilingual corpus have become increasingly important valuable resource for machine translation, cross-language information retrieval, translation dictionary and other applications. This paper describes a new method for bilingual corpus extraction based on patent family and takes example for English-Chinese bilingual corpus. First, it discusses the feasibility of this approach and an available process to extract high quality bilingual corpus from two selected patent database is designed. Then, the English-Chinese equivalent units are obtained through alignment role from bilingual corpus which extracted in order to build the Chinese-English bilingual parallel corpus for a specific science and technology area. Finally, it describes the notes and application prospects of the method.
Keywords:Bilingual corpus  Patent family
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号