首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种面向中文网络百科非结构化信息的知识获取方法
引用本文:王汀,冀付军,徐天晟.一种面向中文网络百科非结构化信息的知识获取方法[J].图书情报工作,2016,60(13):126-133.
作者姓名:王汀  冀付军  徐天晟
作者单位:首都经济贸易大学信息学院 北京 100070
基金项目:本文系首都经济贸易大学科研项目“中文链接数据构建关键技术研究”(项目编号:00791654490223)和北京市社会科学基金项目“微媒体对北京大学生行为模式变化影响的研究”(项目编号:15ZHB011)研究成果之一。
摘    要:目的/意义] 在进行大规模知识库构建时,基于手工方式的构建模式效率较低并且可行性较差,因此,从网络百科中自动地获取海量知识已经被越来越多的学者所关注。目前的研究主要关注于从英文网络百科数据源进行海量知识的抽取,而面向中文百科数据源进行的知识抽取研究工作尚处于起步阶段。方法/过程] 为解决中文大规模知识库的构建问题,提出一种新的基于中文网络百科架构的大规模知识库的自动化构建方法:在第一阶段,对知识三元组中的主语和宾语之间的语义关系进行自扩展学习;在第二阶段,基于条件随机场和支持向量机协同分类器,对标注出的属性和属性值实体之间的语义关系进行预测。结果/结论] 实验评测结果表明,该方法较前人工作在典型中文百科分类页面中的实体识别查准率和查全率分别最高有约10%和6%的提升。

关 键 词:中文知识库  网络开放百科  新词发现  条件随机场  支持向量机  
收稿时间:2016-05-10
修稿时间:2016-06-24

A Novel Knowledge Extraction Approach Oriented on Unstructured Information of Chinese Online Encyclopedia
Wang Ting,Ji Fujun,Xu Tiansheng.A Novel Knowledge Extraction Approach Oriented on Unstructured Information of Chinese Online Encyclopedia[J].Library and Information Service,2016,60(13):126-133.
Authors:Wang Ting  Ji Fujun  Xu Tiansheng
Institution:Information School, Capital University of Economics and Business, Beijing 100070
Abstract:Purpose/significance] In the process of constructing large-scale knowledge base, the manual-based construction approach is lack of efficiency and flexibility. Automatically extracting of massive knowledge from online encyclopedia has attracted attention of an increasing number of scholars. Current research mainly focuses on extracting the data from English online encyclopedia, whereas research about knowledge extraction from Chinese or other languages' data sources is rare.Method/process] This paper proposes an automatic construction scheme for large-scale knowledge base based on Chinese online Encyclopedia. (i)In the first stage of the scheme, self-expanded learning is performed on the semantic relations between subjects and objects among the knowledge triples. (ii)In the second stage, the semantic relationship between marked attributes and their entities is predicted based on Conditional Random Fields (CRFs) and Support Vector Machine (SVM) classifier.Result/conclusion] A large-scale knowledge base is automatically constructed based on the scheme, and the experiment results indicate that the scheme possesses feasibility and effectiveness.
Keywords:Chinese knowledge base  online encyclopedia  new word detection  CRF  SVM  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号