首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向科技语料的短语结构句法分析器
引用本文:王亚楠,马春鹏,曹海龙,赵铁军.面向科技语料的短语结构句法分析器[J].情报工程,2017,3(3):010-020.
作者姓名:王亚楠  马春鹏  曹海龙  赵铁军
作者单位:哈尔滨工业大学机器智能与翻译研究室,哈尔滨工业大学机器智能与翻译研究室,哈尔滨工业大学机器智能与翻译研究室,哈尔滨工业大学机器智能与翻译研究室
基金项目:本文受国家自然科学基金项目(91520204,61572154),863项目(2015AA015405),和微软亚洲研究院合作研究计划的资助。
摘    要:本文介绍了一个由哈尔滨工业大学设计和开发的面向科技语料的短语结构句法分析器。与传统的短语结构句法分析器不同,本句法分析器不需要对输入语料进行预处理。给定未经预处理的语料,本句法分析器可以联合地进行分词、词性标注以及短语结构的句法分析。这可以看成是多任务学习的一个实例。此外,针对科技语料的特点,本句法分析器对所使用的特征模板进行了优化,同时构建了面向科技语料的单词内部结构树库。实验结果表明,我们的句法分析器在通用领域的测试集以及科技领域的测试集上均取得了较好的效果。

关 键 词:短语结构句法分析,科技语料,多任务学习

A Constituent Parser for Science and Technology Corpus
Authors:WANG YaNan  MA ChunPeng  CAO HaiLong and ZHAO TieJun
Institution:Machine Intelligence and Translation Laboratory, Harbin Institute of Technology,Machine Intelligence and Translation Laboratory, Harbin Institute of Technology,Machine Intelligence and Translation Laboratory, Harbin Institute of Technology and Machine Intelligence and Translation Laboratory, Harbin Institute of Technology
Abstract:In this paper, we proposed a constituent parser for science and technology corpus, which was designed and developed by Harbin Institute of Technology. Compared with traditional constituent parsers, the parser of this study does not need to pre-processed corpus. Given a raw text as the input, this parser can do the tasks of word segmentation, POS-tagging and constituent parsing simultaneously. This can be regarded as an instance of multi-task learning. Furthermore, based on the characteristics of science and technology corpus, we optimized the feature templates used in our parser, and constructed a new tree-bank of the inner structures of the words in the science and technology corpora. The results of the experiments indicated that our parser performed well both on the corpus of general domain and on the corpus of science/technology domain.
Keywords:Constituent parsing  science and technology corpus  multi-task learning
点击此处可从《情报工程》浏览原始摘要信息
点击此处可从《情报工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号