首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词库与词法的分词不一致研究
引用本文:董宇,陈小荷.基于词库与词法的分词不一致研究[J].浙江教育学院学报,2008(3):96-102.
作者姓名:董宇  陈小荷
作者单位:1. 金陵科技学院,龙蟠学院,江苏,南京,211169
2. 南京师范大学,文学院,江苏,南京,210097
摘    要:分词不一致问题一直严重影响带标注语料库的标注质量,利用词库与词法知识,可以合理地阐释分词不一致产生的根源,并结合建立的规则库、组合型歧义库、固定词表和特殊单字词表,可以解决汉语分词语料库中相同结构类型的分词不一致问题。计算机利用这些知识较好地识别出了“大+动词(单字)”“动补结构”和“颜色词+物体名”,召回率在96%以上,精确率在95%以上,并能根据用户的要求统一处理成“分”或“合”的形式。

关 键 词:分词不一致  词库  词法  自动分词

On the Disagreement of Participles Based on Lexicon and Morphology
DONG Yu,CHEN Xiaohe.On the Disagreement of Participles Based on Lexicon and Morphology[J].Journal of ZHEJIANG Education Institute,2008(3):96-102.
Authors:DONG Yu  CHEN Xiaohe
Institution:DONG Yu, CHEN Xiaohe ( 1. Longpan College, Jirding College of Science and Technology, Nanjing 211169, Jiangsu, China ; 2. School of the Humanities, Nanjing Normal University, Nanjing 210097, Jiangsu, China)
Abstract:The problem of participles disagreement has been affecting the marking quality of labelled corpus. Utilizing the knowledge of morphology and lexicon, the source of participles disagreement can be explained rationally and with the set-up of regular lexicon and equivocal words, list of fixed words and list of individual words, the disagreement of participles, which are of the same structural type in the lexicon of Chinese participles, can be solved. By using the knowledge, computers can recognize "大 + individual words", "verb + complement structure" and "colour words + names of things", and the calling-in rate is above 96 % and the precision rate is above 95 %. In addition, computers can also make them into forms of "separation" or "combination" according to consumers' demands.
Keywords:disagreement of participles  lexicon  morphology  automatic participles
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号