首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词向量包的自动文摘方法
引用本文:白淑霞,鲍玉来,张晖.基于词向量包的自动文摘方法[J].现代情报,2017,37(2):8-13.
作者姓名:白淑霞  鲍玉来  张晖
作者单位:1. 内蒙古大学图书馆, 内蒙古 呼和浩特 010021;2. 内蒙古大学计算机学院, 内蒙古 呼和浩特 010021
基金项目:国家自然基金项目“基于领域本体的蒙古文数字资源整合机制研究”(项目编号:71163029)。
摘    要:目的]利用向量空间描述语义信息,研究基于词向量包的自动文摘方法;方法]文摘是文献内容缩短的精确表达;而词向量包可以在同一个向量空间下表示词、短语、句子、段落和篇章,其空间距离用于反映语义相似度。提出一种基于词向量包的自动文摘方法,用词向量包的表示距离衡量句子与整篇文献的语义相似度,将与文献语义相似的句子抽取出来最终形成文摘;结果]在DUC01数据集上,实验结果表明,该方法能够生成高质量的文摘,结果明显优于其它方法;结论]实验证明该方法明显提升了自动文摘的性能。

关 键 词:词向量  词包向量  自动文摘  

Automatic Summarization Based on Bag of Word Vector
Authors:Bai Shuxia  Bao Yulai  Zhang Hui
Institution:1. Library, Inner Mongolia University, Hohhot 010021, China;2. School of Computer Science, Inner Mongolia University, Hohhot 010021, China
Abstract:Purposes]This work focused on automatic summarization by utilizing vector space to describe the semantics.Methods]proposed a new representation based on word vector,which is called bag of word vector(BOWV),and employed it for automatic summarization.Words,phrases,sentences,paragraphs and documents could be represented in a same vector space by using BOWV.And the distance between representations was used to reflect the semantic similarity.For automatic summarization,the paper used the distance between BOWVs to measure the semantic similarity between sentences and document.The sentences similar with the document are extracted to form the summary.Findings]Experimental results on DUC01 dataset showed that the proposed method could generate high-quality summary and outperforms comparison methods.Conclusions]The experiment showed that this research improved the performance of automatic summarization significantly.
Keywords:vector  bag of word vector  automatic summarization  
点击此处可从《现代情报》浏览原始摘要信息
点击此处可从《现代情报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号