首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities
Institution:1. Xianyang Vocational Technical College, Xianyang, P. R. China;2. China Electric Power Research Institute, Beijing, P. R. China;3. GuiZhou University, Guizhou Provincial Key Laboratory of Public Big Data, Guiyang, P. R. China;4. State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an, P. R. China;5. Pedagogical University of Krakow, Podchorazych 2 St., 30-084 Kraków, Poland;1. The Hong Kong Polytechnic University, Hong Kong, China;2. Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China;3. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China;4. Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, China;5. Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, China;1. Beijing University of Posts and Telecommunications, Beijing, China;2. Singapore Management University, Singapore;3. Worcester Polytechnic Institute, USA;4. Alibaba Group, Hangzhou, China
Abstract:Today, due to a vast amount of textual data, automated extractive text summarization is one of the most common and practical techniques for organizing information. Extractive summarization selects the most appropriate sentences from the text and provide a representative summary. The sentences, as individual textual units, usually are too short for major text processing techniques to provide appropriate performance. Hence, it seems vital to bridge the gap between short text units and conventional text processing methods.In this study, we propose a semantic method for implementing an extractive multi-document summarizer system by using a combination of statistical, machine learning based, and graph-based methods. It is a language-independent and unsupervised system. The proposed framework learns the semantic representation of words from a set of given documents via word2vec method. It expands each sentence through an innovative method with the most informative and the least redundant words related to the main topic of sentence. Sentence expansion implicitly performs word sense disambiguation and tunes the conceptual densities towards the central topic of each sentence. Then, it estimates the importance of sentences by using the graph representation of the documents. To identify the most important topics of the documents, we propose an inventive clustering approach. It autonomously determines the number of clusters and their initial centroids, and clusters sentences accordingly. The system selects the best sentences from appropriate clusters for the final summary with respect to information salience, minimum redundancy, and adequate coverage.A set of extensive experiments on DUC2002 and DUC2006 datasets was conducted for investigating the proposed scheme. Experimental results showed that the proposed sentence expansion algorithm and clustering approach could considerably enhance the performance of the summarization system. Also, comparative experiments demonstrated that the proposed framework outperforms most of the state-of-the-art summarizer systems and can impressively assist the task of extractive text summarization.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号