首页 | 本学科首页   官方微博 | 高级检索  
     检索      

自适应分词算法中的未登录词识别技术研究
引用本文:程冲,黄水清.自适应分词算法中的未登录词识别技术研究[J].情报学报,2009,28(4).
作者姓名:程冲  黄水清
作者单位:南京农业大学信息科技学院,南京,210095
摘    要:深入研究了未登录词识别技术,并提出了一种新的未登录词识别算法,包括其中的数量词识别规则、边界单字规则、虚字辅助规则、未登录词记忆识别规则以及左右方探测法选取未登录词规则等,使得算法在不依赖大型语料库的前提下可以有效地识别多种领域中各种类型的未登录词.同时,算法通过对绝大部分的交集歧义的识别有效地解决了识别未登录词时导致的新的切分歧义的问题.在网络时文的开放性测试中,分词算法的分词准确率约为90.1%,未登录词识别的准确率、召回率分别为91.2%和94.7%.

关 键 词:汉语分词  未登录词识别  交集型歧义  汉语分词系统

Research on Unlisted Words Identification in Chinese Self-adaptive Segmentation
Cheng Chong,Huang Shuiqing.Research on Unlisted Words Identification in Chinese Self-adaptive Segmentation[J].Journal of the China Society for Scientific andTechnical Information,2009,28(4).
Authors:Cheng Chong  Huang Shuiqing
Institution:College of Information Science and Technology of Nanjing Agricultural University;Nanjing 210095
Abstract:This paper studied on the unlisted words identification.And then it came up with a new unlisted words identification algorithm which is composed of several rules,such as the rule of identification of numerals and quantifiers,auxiliary rules of border words,auxiliary rules of functional word,the rule of unlisted words identification based on memory and the rule of right or left detecting methods to identify unlisted words.At the same time,by comparing the results of the bi-directional segmentation algorithm,...
Keywords:Chinese segmentation  unlisted words identification  crossing ambiguity  Chinese segmentation system
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号