首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于条件随机场的中文人名识别研究
引用本文:邱莎,段玻,申浩如,丁海燕.基于条件随机场的中文人名识别研究[J].昆明师范高等专科学校学报,2011(6):64-66.
作者姓名:邱莎  段玻  申浩如  丁海燕
作者单位:[1]昆明学院信息技术学院,云南昆明650214 [2]云南大学信息学院,云南昆明650091 [3]复旦大学计算机科学技术学院,上海201203
基金项目:昆明学院科研课题资助项目(2009G012)
摘    要:利用条件随机场能够任意添加特征的优点,基于条件随机场模型在字粒度一级进行中文人名识别的研究.根据中文人名在文本中出现的基本特征和上下文特征,结合模型的综合性能,合理构造条件随机场的特征模板,在大规模标注语料上进行训练,统计中文人名在文本中的条件概率分布,获得模型参数,并采用序列标注的方式完成中文人名识别任务.多次闭合测试和开放测试结果表明,F值基本都高于90%.

关 键 词:命名实体识别  中文人名识别  条件随机场  条件概率  特征模板  序列标注

Study on the Recognition of Names of Chinese People Based on Conditional Random Fields
QIU Sha,DUAN Bo,SHEN Hao-ru,DING Hai-yan.Study on the Recognition of Names of Chinese People Based on Conditional Random Fields[J].Journal of Kunming Teachers College,2011(6):64-66.
Authors:QIU Sha  DUAN Bo  SHEN Hao-ru  DING Hai-yan
Institution:1. College of Information Technology, Kunming University, Yunnan Kunming 650214, China 2. College of Information, Yunnan Urfiversity, Yunnan Kunming 650091, China ; 3. College of Computer Science Technology,Fudan University,Shanghai 201203, China)
Abstract:Taking advantage of the ability of using arbitrary features as input in CRFs, the task of the name of Chinese people recognition was discussed based on the Conditional Random Fields on the character level. According to the basic and context features of the Chinese people's names, the feature template of CRFs was built reasonably combining the comprehensive performance of the model which was trained on the large scale corpus to obtain the model's parameters by counting the Chinese names' conditional probability distribution in the texts. By sequence labeling, it implemented the recognition of Chinese names. It obtained promising results on different closed and opened test corpus with the F measurement value of almost 90% above.
Keywords:named entity recognition  Chinese people' s name recognition  conditional random fields  conditional probability  feature template  sequence labeling
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号