首页 | 本学科首页   官方微博 | 高级检索  
     检索      

子句对齐及其在专利统计机器翻译中的应用
引用本文:何彦青,张娟.子句对齐及其在专利统计机器翻译中的应用[J].中国信息导报,2014(4):86-93.
作者姓名:何彦青  张娟
作者单位:1.中国科学技术信息研究所,北京 100038;2.北京联合大学,北京 100101
基金项目:国家自然科学基金项目“面向专利文献的统计机器翻译语境分析”(61303152);“十二五”国家科技支撑计划课题“基 于多源信息的电动汽车数据挖掘关键技术研究(2013BAG06B01)”;国家国际科技合作专项“面向科技文献的日汉双向实用型机器翻 译合作研究”(2014DFA11350)。
摘    要:针对专利文献句子偏长的特点,将统计机器翻译中的训练语料进行子句切割获取双语的子句序列,再采 用统计和规则相结合的策略来生成子句对齐,建立基于简单子句的双语语料来重新训练统计机器翻译系统,在一定程 度上改善了原有双语训练语料中的短语对齐和词对齐,可以更为深入地利用平行语料中蕴含的翻译信息,应用于专利 统计机器翻译中,在NTCIR-9的测试集上进行实验比较,获得较为满意的翻译效果。

关 键 词:子句对齐  词对齐  简单子句  专利文献  统计机器翻译

Sub-sentence Alignment and Its Application for Statistical Patent Machine Translation
He Yanqing,Zhang Juan.Sub-sentence Alignment and Its Application for Statistical Patent Machine Translation[J].China Information Review,2014(4):86-93.
Authors:He Yanqing  Zhang Juan
Institution:1.Institute of scientific and Technical Information of China, Beijing 100038; 2.Beijing Union University, Beijing 100101
Abstract:For sentences in patent documents are often long, this paper tries to segment the training corpus of statistical machine translation into bilingual sub-sentence lists and uses statistical strategies and rules to obtain their sub-sentence alignment. Then new-generated training corpus based on simple sub-sentences is added into the training data to train statistical machine translation system. This method improves phrase alignment and word alignment in bilingual training corpus. It also digs translation information in parallel corpus more deeply and improves translation quality. This method was applied to statistical patent machine translation. Experiments were conducted on the test set in NTCIR-9 and a satisfactory translation result was obtained.
Keywords:sub-sentence alignment  word alignment  simple sentence  patent text  statistical machine translation
点击此处可从《中国信息导报》浏览原始摘要信息
点击此处可从《中国信息导报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号