首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Cyberbullying detection for low-resource languages and dialects: Review of the state of the art
Institution:1. Earthquake Research Center, Ferdowsi University of Mashhad, Iran;2. Department of Knowledge and Information Science, Ferdowsi University of Mashhad, Iran;1. College of Artificial Intelligence, Beijing Information Technology College, Beijing, 100018, China;2. College of Engineering and IT University of Dubai, UAE;3. Independent Researcher, USA;4. Department of Computer Science, College of Computer and Information Sciences, Majmaah University. Al-Majmaah, 11952, Saudi Arabia;5. Department of Electrical Engineering, College of Engineering in Al-Kharj, Prince Sattam Bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia;1. School of Information Management, Wuhan University, Wuhan, Hubei 430072, China;2. Center for Studies of Information Resources, Wuhan University, Wuhan, Hubei 430072, China;3. School of Information Management, Nanjing University, Nanjing, Jiangsu 210023, China;1. Information Research Institute of Qilu University of Technology (Shandong Academy of Sciences), Jinan, PR China;2. School of Management, Xi''an University of Architecture and Technology, Xi''an, PR China;3. School of Information and Control Engineering, Xi''an University of Architecture and Technology, Xi''an, PR China;4. University of Jinan, Jinan, PR China;1. School of Business and Management, Jilin University, Changchun, China;2. Research Center for Big Data Management, Jilin University, Changchun, China;3. Department of Pediatrics, The Second Hospital of Jilin University, Changchun, China;4. Department of Information Technology & Decision Sciences, Old Dominion University, Norfolk, VA, United States
Abstract:The struggle of social media platforms to moderate content in a timely manner, encourages users to abuse such platforms to spread vulgar or abusive language, which, when performed repeatedly becomes cyberbullying — a social problem taking place in virtual environments, yet with real-world consequences, such as depression, withdrawal, or even suicide attempts of its victims. Systems for the automatic detection and mitigation of cyberbullying have been developed but, unfortunately, the vast majority of them are for the English language, with only a handful available for low-resource languages. To estimate the present state of research and recognize the needs for further development, in this paper we present a comprehensive systematic survey of studies done so far for automatic cyberbullying detection in low-resource languages. We analyzed all studies on this topic that were available.We investigated more than seventy published studies on automatic detection of cyberbullying or related language in low-resource languages and dialects that were published between around 2017 and January 2023. There are 23 low-resource languages and dialects covered by this paper, including Bangla, Hindi, Dravidian languages and others. In the survey, we identify some of the research gaps of previous studies, which include the lack of reliable definitions of cyberbullying and its relevant subcategories, biases in the acquisition, and annotation of data. Based on recognizing those research gaps, we provide some suggestions for improving the general research conduct in cyberbullying detection, with a primary focus on low-resource languages. Based on those proposed suggestions, we collect and release a cyberbullying dataset in the Chittagonian dialect of Bangla and propose a number of initial ML solutions trained on that dataset. In addition, pre-trained transformer-based the BanglaBERT model was also attempted. We conclude with additional discussions on ethical issues regarding such studies, highlight how our survey improves on similar surveys done in the past, and discuss the usefulness of recently popular AI-enhanced tools for streamlining such scientific surveys.
Keywords:Automatic cyberbullying detection  Low-resource language  Machine learning  Social media
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号