首页 | 本学科首页   官方微博 | 高级检索  
     检索      


ISE-Hate: A benchmark corpus for inter-faith,sectarian, and ethnic hatred detection on social media in Urdu
Abstract:Social media has become the most popular platform for free speech. This freedom of speech has given opportunities to the oppressed to raise their voice against injustices, but on the other hand, this has led to a disturbing trend of spreading hateful content of various kinds. Pakistan has been dealing with the issue of sectarian and ethnic violence for the last three decades and now due to freedom of speech, there is a growing trend of disturbing content about religion, sect, and ethnicity on social media. This necessitates the need for an automated system for the detection of controversial content on social media in Urdu which is the national language of Pakistan. The biggest hurdle that has thwarted the Urdu language processing is the scarcity of language resources, annotated datasets, and pretrained language models. In this study, we have addressed the problem of detecting Interfaith, Sectarian, and Ethnic hatred on social media in Urdu language using machine learning and deep learning techniques. In particular, we have: (1) developed and presented guidelines for annotating Urdu text with appropriate labels for two levels of classification, (2) developed a large dataset of 21,759 tweets using the developed guidelines and made it publicly available, and (3) conducted experiments to compare the performance of eight supervised machine learning and deep learning techniques, for the automated identification of hateful content. In the first step, experiments are performed for the hateful content detection as a binary classification task, and in the second step, the classification of Interfaith, Sectarian and Ethnic hatred detection is performed as a multiclass classification task. Overall, Bidirectional Encoder Representation from Transformers (BERT) proved to be the most effective technique for hateful content identification in Urdu tweets.
Keywords:Hateful content detection  Urdu  Corpus generation  Sectarian  BERT  Ethnic  Hatred
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号