Web document summarization by exploiting social context with matrix co-factorization |
| |
Authors: | Minh-Tien Nguyen Viet Cuong Tran Xuan Hoai Nguyen Le-Minh Nguyen |
| |
Institution: | 1. Faculty of Information Technology, Hung Yen University of Technology and Education, Hung Yen, Vietnam;2. School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan;3. Hanoi University of Science and Technology, Hanoi, Vietnam;4. AI Academy Vietnam, 489 Hoang Quoc Viet Rd, Hanoi, Vietnam |
| |
Abstract: | In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods. |
| |
Keywords: | Corresponding author Data mining Information retrieval Document summarization Social context summarization Matrix factorization |
本文献已被 ScienceDirect 等数据库收录! |
|