首页 | 本学科首页   官方微博 | 高级检索  
     检索      


MRC-Sum: An MRC framework for extractive summarization of academic articles in natural sciences and medicine
Institution:1. College of Economics and Management, Fujian Agriculture and Forestry University, Fuzhou 350002, China;2. School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;3. Business Administration Department, Applied College, Najran University, Najran, Saudi Arabia;4. Shariaa, Educational and Humanities Research Center (SEHRC), Najran University, Najran, Saudi Arabia;5. Department of Industrial & Systems Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia;6. Department of Industrial Engineering, College of Engineering in Al-Qunfudah, Umm Al-Qura University, Makkah 21955, Saudi Arabia;1. College of Economics, Shenzhen University, Shenzhen, Guangdong 518060, China;2. School of Management, Huazhong University of Science and Technology, Wuhan 430074, China;1. Earthquake Research Center, Ferdowsi University of Mashhad, Iran;2. Department of Knowledge and Information Science, Ferdowsi University of Mashhad, Iran
Abstract:Extractive summarization for academic articles in natural sciences and medicine has attracted attention for a long time. However, most existing extractive summarization models often process academic articles with sentence classification models, which are hard to produce comprehensive summaries. To address this issue, we explore a new view to solve the extractive summarization of academic articles in natural sciences and medicine by taking it as a question-answering process. We propose a novel framework, MRC-Sum, where the extractive summarization for academic articles in natural sciences and medicine is cast as an MRC (Machine Reading Comprehension) task. To instantiate MRC-Sum, article-summary pairs in the summarization datasets are firstly reconstructed into (Question, Answer, Context) triples in the MRC task. Several questions are designed to cover the main aspects (e.g. Background, Method, Result, Conclusion) of the articles in natural sciences and medicine. A novel strategy is proposed to solve the problem of the non-existence of the ground truth answer spans. Then MRC-Sum is trained on the reconstructed datasets and large-scale pre-trained models. During the inference stage, four answer spans of the predefined questions are given by MRC-Sum and concatenated to form the final summary for each article. Experiments on three publicly available benchmarks, i.e., the Covid, PubMed, and arXiv datasets, demonstrate the effectiveness of MRC-Sum. Specifically, MRC-Sum outperforms advanced extractive summarization baselines on the Covid dataset and achieves competitive results on the PubMed and arXiv datasets. We also propose a novel metric, COMPREHS, to automatically evaluate the comprehensiveness of the system summaries for academic articles in natural sciences and medicine. Abundant experiments are conducted and verified the reliability of the proposed metric. And the results of the COMPREHS metric show that MRC-Sum is able to generate more comprehensive summaries than the baseline models.
Keywords:Extractive summarization  Machine reading comprehension  Deep learning  Academic articles
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号