子句对齐及其在专利统计机器翻译中的应用 Sub-sentence Alignment and Its Application for Statistical Patent Machine Translation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

子句对齐及其在专利统计机器翻译中的应用

引用本文：	何彦青,张娟.子句对齐及其在专利统计机器翻译中的应用[J].中国信息导报,2014(4):86-93.

作者姓名：	何彦青张娟

作者单位：	1．中国科学技术信息研究所，北京 100038；2．北京联合大学，北京 100101

基金项目：	国家自然科学基金项目“面向专利文献的统计机器翻译语境分析”（61303152）；“十二五”国家科技支撑计划课题“基于多源信息的电动汽车数据挖掘关键技术研究（2013BAG06B01）”；国家国际科技合作专项“面向科技文献的日汉双向实用型机器翻译合作研究”（2014DFA11350）。

摘要：	针对专利文献句子偏长的特点，将统计机器翻译中的训练语料进行子句切割获取双语的子句序列，再采用统计和规则相结合的策略来生成子句对齐，建立基于简单子句的双语语料来重新训练统计机器翻译系统，在一定程度上改善了原有双语训练语料中的短语对齐和词对齐，可以更为深入地利用平行语料中蕴含的翻译信息，应用于专利统计机器翻译中，在NTCIR-9的测试集上进行实验比较，获得较为满意的翻译效果。
关键词：	子句对齐词对齐简单子句专利文献统计机器翻译
Sub-sentence Alignment and Its Application for Statistical Patent Machine Translation

He Yanqing,Zhang Juan.Sub-sentence Alignment and Its Application for Statistical Patent Machine Translation[J].China Information Review,2014(4):86-93.

Authors:	He Yanqing Zhang Juan

Institution:	1.Institute of scientific and Technical Information of China, Beijing 100038; 2.Beijing Union University, Beijing 100101

Abstract:	For sentences in patent documents are often long, this paper tries to segment the training corpus of statistical machine translation into bilingual sub-sentence lists and uses statistical strategies and rules to obtain their sub-sentence alignment. Then new-generated training corpus based on simple sub-sentences is added into the training data to train statistical machine translation system. This method improves phrase alignment and word alignment in bilingual training corpus. It also digs translation information in parallel corpus more deeply and improves translation quality. This method was applied to statistical patent machine translation. Experiments were conducted on the test set in NTCIR-9 and a satisfactory translation result was obtained.

Keywords:	sub-sentence alignment word alignment simple sentence patent text statistical machine translation

	点击此处可从《中国信息导报》浏览原始摘要信息
	点击此处可从《中国信息导报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏