首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义和引用加权的文献主题提取研究
引用本文:杨春艳,潘有能,赵莉.基于语义和引用加权的文献主题提取研究[J].图书情报工作,2016,60(9):131.
作者姓名:杨春艳  潘有能  赵莉
作者单位:1. 宁波大学图书馆与信息中心 宁波 315211; 2. 浙江大学公共管理学院 杭州 310028
基金项目:本研究系国家社会科学基金项目"学术型大数据知识组织与服务标准研究"(项目编号:15FTQ002)研究成果之一。
摘    要:目的/意义]传统的文献主题提取方法主要是通过关键词、摘要、全文等提取文献的主题内容,使得主题内容不全面或存在"噪音",而从文献内容语义出发,结合引用内容提取文献的主题,能够更加准确地提取出多文档的主题内容。方法/过程]提出一种面向多文档的基于语义和引用加权的科技文献主题提取算法,利用文献的引用内容和关键词构建Labeled-LDA主题模型,形成文档-主题概率向量,再根据K-means聚类方法聚类文档,提取每类文档集的主题内容。结果/结论]以PubMed生物医学数据库中的数据作为实验数据,测试该方法的可靠性,结果证明该方法能够准确、全面地提取出多文档的主题内容。

关 键 词:Labeled-LDA模型  引用内容  主题提取  
收稿时间:2016-01-04

Study on Topic Extraction of Literatures Based on Weighted Semantic and Citation Relation
Yang Chunyan,Pan Youneng,Zhao Li.Study on Topic Extraction of Literatures Based on Weighted Semantic and Citation Relation[J].Library and Information Service,2016,60(9):131.
Authors:Yang Chunyan  Pan Youneng  Zhao Li
Institution:1. Library and Information Center, Ningbo University, Ningbo 315211; 2. College of Public Administration, Zhejiang University, Hangzhou 310028
Abstract:Purpose/significance] The traditional methods of topic extraction mainly extract the themes of literatures by keywords, abstracts and full texts, but their results are not comprehensive or have noises. The method which starts from the semantic of literature content and is combined with the citation content, can extract the themes of literatures more accurately. Result/conclusion] This article proposes a literature topic extracting algorithm based on weighted semantic and citation relation for multi-documents. It builds the Labeled-LDA topic Model with the citation content and keywords of literatures, gets documents-topics probability distribution. Then it clusters documents through the K-means clustering method and extracts the topics of each type of documents. Result/conclusion] In the experimental part, the test data are downloaded from the PubMed database. The result shows that the method can accurately extract the theme of literatures.
Keywords:Labeled-LDA Model  citation content  topic extraction  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号