首页 | 本学科首页   官方微博 | 高级检索  
     检索      

不同语料下基于LDA主题模型的科学文献主题抽取效果分析
引用本文:关鹏,王曰芬,傅柱.不同语料下基于LDA主题模型的科学文献主题抽取效果分析[J].图书情报工作,2016,60(2):112-121.
作者姓名:关鹏  王曰芬  傅柱
作者单位:1. 南京理工大学经济管理学院 南京 210094; 2. 巢湖学院应用数学学院 合肥 238000
基金项目:本文系国家自然科学基金研究项目"新研究领域科学文献传播网络生长及对传播效果影响研究"(项目编号:71373124)和安徽省高校自然科学基金研究项目(项目编号:KJ2013B165、KJ2015A270)研究成果之一。
摘    要:目的/意义]潜在狄利克雷分布(Latent Dirichlet Allocation,LDA)在科技情报分析中用来发现学科主题、挖掘研究热点以及预测研究趋势等。对常见的科学文献文本语料库(关键词、摘要、关键词+摘要)进行LDA主题抽取效果的评价,以揭示不同语料库的主题抽取效果,提高LDA在科技情报分析中的应用效果。方法/过程]对上述3种语料库下的LDA主题模型进行对比研究,采用基于查全率、查准率、F值以及信息熵的定量分析和基于主题抽取的广度和主题粒度的定性分析相结合的方法对主题抽取效果进行评价。结果/结论]通过国内风能领域的科学文献数据实证研究发现,无论是从定量分析还是从定性分析来看,摘要和关键词+摘要作为语料的LDA主题抽取的效果均优于关键词作为语料的LDA主题抽取效果,并且前者在主题抽取的广度方面表现更好,而后者抽取的主题粒度更细。

关 键 词:主题模型  LDA  主题抽取  效果分析  科学文献  
收稿时间:2015-12-13

Effect Analysis of Scientific Literature Topic Extraction Based on LDA Topic Model with Different Corpus
Guan Peng,Wang Yuefen,Fu Zhu.Effect Analysis of Scientific Literature Topic Extraction Based on LDA Topic Model with Different Corpus[J].Library and Information Service,2016,60(2):112-121.
Authors:Guan Peng  Wang Yuefen  Fu Zhu
Institution:1. School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094; 2. Institute of Applied Mathematics, Chaohu University, Hefei 238000
Abstract:Purpose/significance] Latent Dirichlet Allocation(LDA) is used to discover subject topic, hot topic and development trend in scientific and technical intelligence analysis. The paper evaluates the effect of LDA topic extraction with three common scientific literature corpuses, which are structured by keywords, abstracts or mixture of keywords and abstracts. The purpose of this thesis is to promote the effect of using LDA in science and technology intelligence analysis.Method/process] We analyze effect of topic extraction by LDA under three above-mentioned corpuses and evaluate the results by two patterns. One is quantitative analysis by using quantitative indexes, including precision rate, recall rate, F-score and information entropy;the other one is qualitative analysis, including two dimensionalities:extent of topic extraction and granularity of topic.Result/conclusion] Experiments on scientific and technical literatures of domestic wind energy field show that the effect of topic extraction by LDA with abstracts or mixture of keywords and abstracts is better than LDA with keywords, whether from quantitative analysis or qualitative analysis. LDA with abstracts and mixture of keywords and abstracts has different application scenarios. The former has larger extent of topic extraction and the latter has smaller granularity of topic.
Keywords:topic model  LDA  topic extraction  effect analysis  scientific literature  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号