Unsupervised dialectal neural machine translation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Unsupervised dialectal neural machine translation

Institution:	1. Samsung R&D Institute Jordan, Jordan;2. Jordan University of Science and Technology, Jordan;1. School of Information Management, Wuhan University, Wuhan, Hubei, China;2. Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands;3. Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands;1. Bitlis Eren University, Informatics Department, 13000 Bitlis, Turkey;2. ?nönü University, Department of Computer Engineering, 44000 Malatya, Turkey;1. Dept. of Computing and Numerical Analysis University of Córdoba Córdoba, Spain;2. Maimonides Biomedical Research Institute of Cordoba (IMIBIC) Reina Sofia University Hospital, Córdoba, Spain;3. General and Digestive Surgery San Juan de Dios Hospital Córdoba, Spain;1. Department of Computer Information Systems, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;2. King Fahd University Hospital, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;3. Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia

Abstract:	In this paper, we present the first work on unsupervised dialectal Neural Machine Translation (NMT), where the source dialect is not represented in the parallel training corpus. Two systems are proposed for this problem. The first one is the Dialectal to Standard Language Translation (D2SLT) system, which is based on the standard attentional sequence-to-sequence model while introducing two novel ideas leveraging similarities among dialects: using common words as anchor points when learning word embeddings and a decoder scoring mechanism that depends on cosine similarity and language models. The second system is based on the celebrated Google NMT (GNMT) system. We first evaluate these systems in a supervised setting (where the training and testing are done using our parallel corpus of Jordanian dialect and Modern Standard Arabic (MSA)) before going into the unsupervised setting (where we train each system once on a Saudi-MSA parallel corpus and once on an Egyptian-MSA parallel corpus and test them on the Jordanian-MSA parallel corpus). The highest BLEU score obtained in the unsupervised setting is 32.14 (by D2SLT trained on Saudi-MSA data), which is remarkably high compared with the highest BLEU score obtained in the supervised setting, which is 48.25.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏