首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Unsupervised dialectal neural machine translation
Institution:1. Samsung R&D Institute Jordan, Jordan;2. Jordan University of Science and Technology, Jordan;1. School of Information Management, Wuhan University, Wuhan, Hubei, China;2. Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands;3. Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands;1. Bitlis Eren University, Informatics Department, 13000 Bitlis, Turkey;2. ?nönü University, Department of Computer Engineering, 44000 Malatya, Turkey;1. Dept. of Computing and Numerical Analysis University of Córdoba Córdoba, Spain;2. Maimonides Biomedical Research Institute of Cordoba (IMIBIC) Reina Sofia University Hospital, Córdoba, Spain;3. General and Digestive Surgery San Juan de Dios Hospital Córdoba, Spain;1. Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;2. King Fahd University Hospital, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;3. Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia
Abstract:In this paper, we present the first work on unsupervised dialectal Neural Machine Translation (NMT), where the source dialect is not represented in the parallel training corpus. Two systems are proposed for this problem. The first one is the Dialectal to Standard Language Translation (D2SLT) system, which is based on the standard attentional sequence-to-sequence model while introducing two novel ideas leveraging similarities among dialects: using common words as anchor points when learning word embeddings and a decoder scoring mechanism that depends on cosine similarity and language models. The second system is based on the celebrated Google NMT (GNMT) system. We first evaluate these systems in a supervised setting (where the training and testing are done using our parallel corpus of Jordanian dialect and Modern Standard Arabic (MSA)) before going into the unsupervised setting (where we train each system once on a Saudi-MSA parallel corpus and once on an Egyptian-MSA parallel corpus and test them on the Jordanian-MSA parallel corpus). The highest BLEU score obtained in the unsupervised setting is 32.14 (by D2SLT trained on Saudi-MSA data), which is remarkably high compared with the highest BLEU score obtained in the supervised setting, which is 48.25.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号