A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts

Institution:	1. Bitlis Eren University, Informatics Department, 13000 Bitlis, Turkey;2. ?nönü University, Department of Computer Engineering, 44000 Malatya, Turkey;1. Dept. of Computing and Numerical Analysis University of Córdoba Córdoba, Spain;2. Maimonides Biomedical Research Institute of Cordoba (IMIBIC) Reina Sofia University Hospital, Córdoba, Spain;3. General and Digestive Surgery San Juan de Dios Hospital Córdoba, Spain;1. Computer Science Department, Universidad Carlos III de Madrid Spain;2. Faculty of Computer Sciences, Østfold University College Norway;3. Foundation for Biomedical Research, Príncipe de Asturias Hospital Spain;4. Computer Science Department, Universidad de Alcalá Spain;1. Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;2. King Fahd University Hospital, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;3. Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;1. School of Remote Sensing and Information Engineering, Wuhan University, #129 Luoyu Road, Wuhan, Hubei, PR China;2. School of Software, South China Normal University, #55 West of Zhongshan Avenue, Guangzhou, Guangdong, PR China;3. School of Printing and Packaging, Wuhan University, #129 Luoyu Road, Wuhan, Hubei, PR China;4. Suzhou Institute, Wuhan University, #377 Linquan Street, Suzhou, Jiangsu, PR China

Abstract:	Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏