首页 | 本学科首页   官方微博 | 高级检索  
     检索      

新闻文档实体重要性排序研究
引用本文:陆娜,周鹏程,武川.新闻文档实体重要性排序研究[J].图书情报工作,2018,62(11):97-102.
作者姓名:陆娜  周鹏程  武川
作者单位:1. 海南师范大学信息科学技术学院 海口 571158; 2. 武汉大学信息管理学院 武汉 430072
基金项目:本文系国家自然科学基金面上项目"基于语言模型的通用实体检索建模及框架实现研究"(项目编号:71173164)和国家自然科学地区科学基金项目"基于需求社群的协商式旅游需求自动聚合方法研究"(项目编号:71762010)研究成果之一。
摘    要:目的/意义]现有新闻文档实体排序研究大多以文档或实体为中心,如文本分类、实体链接等,关注实体在文本中的重要性的研究较少,本研究探讨基于重要性的新闻文档实体排序。方法/过程]给定一篇文档,判断文档中实体相对文档而言的重要性,并基于此对实体进行排序。在搜狗全网新闻数据集上进行实验,并利用NDCG和逆序对比率两个指标对实体排序结果进行评价。结果/结论]实验结果表明,基于实体频率、TF*IDF、信息熵、TextRank等的方法以及集成方法都达到了较好的效果,基于聚集系数的方法效果一般。其中基于TF*IDF的方法NDCG值为95.86%,是该指标下的最好结果;基于集成方法的逆序对比率值为84.46%,是该指标下的最好结果。

关 键 词:新闻文档  实体重要性  实体排序  
收稿时间:2018-01-02

Importance Based Entity Ranking for News Documents
Lu Na,Zhou Pengcheng,Wu Chuan.Importance Based Entity Ranking for News Documents[J].Library and Information Service,2018,62(11):97-102.
Authors:Lu Na  Zhou Pengcheng  Wu Chuan
Institution:1. School of Information Science and Technology, Hainan Normal University, Haikou 571158; 2. School of Information Management, Wuhan University, Wuhan 430072
Abstract:Purpose/significance] We propose an importance based method for entity ranking. Entities in a particular document show different importance. Many researches focus on documents or entities, such as text categorization and entity linking, while few research pay attention to the importance of entities in documents. This research has significant theoretical and practical value. Method/process] Given a document which consists of words and entities, our method computes the relative importance of entities in the document, and then ranks these entities based on their importance with respect to the document. We perform experiment on the Sogou News dataset, and use evaluation metrics such as NDCG and inversed pair rate to evaluate the results. Result/conclusion] Experimental results show that methods based on entity frequency, TF*IDF, distribution entropy and TextRank achieve better performance, while method based on cluster coefficient does not work well. In terms of NDCG, TF*IDF method reaches 95.86%, which is the best result and in terms of the inverse rate, the ensemble method reaches 84.46%, which is the best result.
Keywords:news documents  entity importance  entity ranking  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号