首页 | 本学科首页   官方微博 | 高级检索  
     检索      

以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨
引用本文:黄水清,王东波,何琳.以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨[J].图书情报工作,2015,59(11):127-133.
作者姓名:黄水清  王东波  何琳
作者单位:南京农业大学信息科学技术学院, 南京, 210095
摘    要:目的/意义] 在人文计算兴起这一背景下, 为了更加深入和精准地从古代典籍中挖掘出相应的知识, 针对先秦文献进行自动分词的探究。方法/过程] 基于《汉学引得丛刊》中的《春秋经传注疏引书引得》制定词汇表, 在由《春秋左氏传》和《晏子春秋》所构成的训练和测试语料上, 通过条件随机场模型, 结合使用统计和人工内省方法确定的特征模板, 完成对先秦典籍进行自动分词的探究。结果/结论] 在先秦典籍自动分词的整个流程基础上, 得到简单特征模板、内部特征模板和组合特征模板下的自动分词模型, 最好的分词模型调和平均值达到97.47%, 具有较强的推广和应用价值。在构建自动分词模型的过程中, 通过融入内部和外部的特征知识, 模型的精确率和召回率得到有效的提升。

关 键 词:人文计算  《汉学引得丛刊》  条件随机场模型  特征模板  
收稿时间:2015-05-12

Exploring of Word Segmentation for Fore-Qin Literature Based on the Domain Glossary of Sinological Index Series
Huang Shuiqing,Wang Dongbo,He Lin.Exploring of Word Segmentation for Fore-Qin Literature Based on the Domain Glossary of Sinological Index Series[J].Library and Information Service,2015,59(11):127-133.
Authors:Huang Shuiqing  Wang Dongbo  He Lin
Institution:College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095
Abstract:Purpose/significance] With the rising of humanities computing, in order to more deeply and accurately mine the corresponding knowledge from the ancient classics, the Fore-Qin Literature is automatically segmented in this paper.Method/process] Based on domain glossary of Zuo Commentary from the Sinological Index Series, the paper finishes the segmentation of Fore-Qin Literature on the corpus of train and test which consist of Zuo Commentary and Yanzi's Spring and Autum Annals by the conditional random fields which uses the feature template determined by the method of statistics and rules. Result/conclusion] The segmentation models based on simple feature template, internal feature template and combined feature template are obtained under the framework of word segmentation for Fore-Qin Literature. The best F-measure of segmentation model reaches 97.47%, which has a great potential for popularization and application.In the processof constructing the model, the precision rate and recall rate of segmentation model are effectively enhanced by merging internal and external feature knowledge.
Keywords:humanities computing  Sinological Index Series  conditional random fields  feature template  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号