首页 | 本学科首页   官方微博 | 高级检索  
     检索      

中文文献引文情感语料库构建
引用本文:徐琳宏,丁堃,陈娜,李冰.中文文献引文情感语料库构建[J].情报学报,2020,39(1):25-37.
作者姓名:徐琳宏  丁堃  陈娜  李冰
作者单位:大连理工大学科学学与科技管理研究所暨WISE实验室,大连 116024;大连外国语大学软件学院,大连 116044;大连理工大学科学学与科技管理研究所暨WISE实验室,大连 116024
基金项目:国家自然科学基金项目“基于引用极性和评论挖掘的论文综合评价模型研究”(61772103),“面向社交媒体的多语种文本情感分析方法研究”(61806038)
摘    要:基于内容的引文情感分析克服了传统基于引用频次的引用同一化问题,是引文内容分析领域一个重要的研究热点。然而引文情感分析依赖于带标注的数据集,目前大规模高质量的引文情感语料资源匮乏,严重制约了该领域的研究。因此,本文在分析引文情感表达方式的基础上提出了一套适用于引文情感表示的标注体系,并详细阐述了语料库建设的技术和方法。采用人机结合的标注策略,借助完善的引文标注系统,构建了规模较大的中文文献的引文情感语料库。统计结果显示,在中文信息处理和科技管理领域情感褒义和贬义总的引用的占比分别为22%和6%,引文情感标注kappa值达到0.852,表明该语料库能够客观地反映作者的情感倾向性,可为论文评价、引文网络分析和情感分析等相关领域的研究提供数据支撑。

关 键 词:引文情感分析  一致性检验  标注体系

Corpus Construction for Citation Sentiment in Chinese Literature
Xu Linhong,Ding Kun,Chen Naand Li Bing.Corpus Construction for Citation Sentiment in Chinese Literature[J].Journal of the China Society for Scientific andTechnical Information,2020,39(1):25-37.
Authors:Xu Linhong  Ding Kun  Chen Naand Li Bing
Institution:(WISE Lab,Institute of Science of Science and Technology Management,Dalian University of Technology,Dalian 116024;Software Institute,Dalian University of Foreign Languages,Dalian 116044)
Abstract:A content-based citation sentiment analysis overcomes the traditional problem of frequency-based citation assimilation, which is an important research hotspot in the field of citation content analysis. However, citation sentiment analysis relies on annotated datasets, and the lack of a large-scale and high-quality citation sentiment corpus seriously restricts research progress in this field. Therefore, based on the analysis of citation sentiment expression, a set of annotation schemes for such expression is proposed in this paper, along with elaboration regarding the technology and method of corpus construction. A large-scale citation sentiment corpus on Chinese literature was constructed using the human-computer interaction annotation strategy through a comprehensive citation annotation system. The statistical results show the proportions of positive and negative citations as 22% and 6%, respectively, and the kappa value of citation sentiment reached0.852, indicating that this corpus objectively reflects the authors sentiments and can provide data support for research in related fields such as paper evaluation, citation network analysis, and sentiment analysis.
Keywords:citation sentiment analysis  consistency validation  annotation scheme
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号