EXPLORING TEXTUAL HATE SPEECH IDENTIFICATION APPROACHES AND DATASETS: A SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

Main Article Content

Husnain Saleem
Muhammad Javed
Muhammad Zubair Asghar
Muhammad Ahmad Jan
Maria Zuraiz
Aftab Ali
Asad Ullah

Keywords

Bias Analysis, Deep Learning, Hate Speech Identification, Hate Speech Datasets, Machine Learning, Multilingual, Multimodal

Abstract

There have been growing concerns about the influence of hate speech on social discourse and its ability to instigate violence and prejudice as it has spread widely across internet platforms. Researchers and service providers must now prioritize identifying and regulating hate speech. In this survey, we look at studies published between 2018 and 2023 that explore various aspects of hate speech identification. This review begins by pointing out the alarming growth of hate speech on the internet and its adverse effects, underscoring the importance of developing reliable identification mechanisms. Based on the papers' principal focuses, we classify them into one of five broad themes: dataset construction, algorithm development, bias analysis, multilingual and multimodal techniques, and ethical considerations. This systematic literature review and meta-analysis highlights the need for standardized evaluation metrics, more extensive datasets, and robust algorithms to deal with the ever-evolving nature of hate speech while pointing out the shortcomings of currently available hate speech identification methods and datasets. To effectively counteract online hate speech, researchers, legislators, and technology businesses will find this comprehensive assessment an invaluable resource, an in-depth overview of the hate speech identification landscape. Future research initiatives on this crucial topic can build upon the insights and problems given here.

Abstract 127 | pdf Downloads 59

References

1. Abro, S., Shaikh, S., Khand, Z. H., Zafar, A., Khan, S., & Mujtaba, G. (2020). Automatic hate speech identification using machine learning: A comparative study. International Journal of Advanced Computer Science and Applications, 11(8).
2. Akhter, M. P., Jiangbin, Z., Naqvi, I. R., Abdelmajeed, M., & Sadiq, M. T. (2020). Automatic identification of offensive language for urdu and roman urdu. IEEE Access, 8, 91213-91226.
3. Akram, M. H., Shahzad, K., & Bashir, M. (2023). ISE-Hate: A benchmark corpus for inter-faith, sectarian, and ethnic hatred identification on social media in Urdu. Information Processing & Management, 60(3), 103270.
4. Al-Hassan, A., & Al-Dossari, H. (2019, February). Identification of hate speech in social networks: a survey on multilingual corpus. In 6th international conference on computer science and information technology (Vol. 10, pp. 10-5121).
5. Ali, R., Farooq, U., Arshad, U., Shahzad, W., & Beg, M. O. (2022). Hate speech identification on Twitter using transfer learning. Computer Speech & Language, 74, 101365.
6. Aljarah, I., Habib, M., Hijazi, N., Faris, H., Qaddoura, R., Hammo, B., ... & Alfawareh, M. (2021). Intelligent identification of hate speech in Arabic social network: A machine learning approach. Journal of Information Science, 47(4), 483-501.
7. Alrehili, A. (2019, November). Automatic hate speech identification on social media: A brief survey. In 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) (pp. 1-6). IEEE.
8. Alshalan, R., & Al-Khalifa, H. (2020). A deep learning approach for automatic hate speech identification in the saudi twittersphere. Applied Sciences, 10(23), 8614.
9. Aluru, S. S., Mathew, B., Saha, P., & Mukherjee, A. (2020). Deep learning models for multilingual hate speech identification. arXiv preprint arXiv:2004.06465.
10. Arango, A., Pérez, J., & Poblete, B. (2019, July). Hate speech identification is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd international acm sigir conference on research and development in information retrieval (pp. 45-54).
11. Aziz, S., Sarfraz, M. S., Usman, M., Aftab, M. U., & Rauf, H. T. (2023). Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu. Mathematics, 11(4), 969.
12. Bosco, C., Felice, D. O., Poletto, F., Sanguinetti, M., & Maurizio, T. (2018). Overview of the evalita 2018 hate speech identification task. In Ceur workshop proceedings (Vol. 2263, pp. 1-9). CEUR.
13. Chakravarthi, B. R., &Muralidaran, V. (2021, April). Findings of the shared task on hope speech identification for equality, diversity, and inclusion. In Proceedings of the first workshop on language technology for equality, diversity and inclusion (pp. 61-72).
14. Corazza, M., Menini, S., Cabrio, E., Tonelli, S., & Villata, S. (2020). A multilingual evaluation for online hate speech identification. ACM Transactions on Internet Technology (TOIT), 20(2), 1-22
15. Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language identification datasets. arXiv preprint arXiv:1905.12516.
16. Dewani, A., Memon, M. A., & Bhatti, S. (2021). Development of computational linguistic resources for automated identification of textual cyberbullying threats in Roman Urdu language. 3 c TIC: cuadernos de desarrollo aplicados a las TIC, 10(2), 101-121.
17. Dhanya, L. K., & Balakrishnan, K. (2021, June). Hate speech identification in Asian languages: a survey. In 2021 international conference on communication, control and information sciences (ICCISc) (Vol. 1, pp. 1-5). IEEE.
18. Elzayady, H., Mohamed, M. S., Badran, K. M., & Salama, G. I. (2023). A hybrid approach based on personality traits for hate speech identification in Arabic social media. International Journal of Electrical and Computer Engineering, 13(2), 1979.
19. Florio, K., Basile, V., Polignano, M., Basile, P., & Patti, V. (2020). Time of your hate: The challenge of time in hate speech identification on social media. Applied Sciences, 10(12), 4180.
20. Fortuna, P., da Silva, J. R., Wanner, L., & Nunes, S. (2019, August). A hierarchically-labeled portuguese hate speech dataset. In Proceedings of the third workshop on abusive language online (pp. 94-104).
21. Fortuna, P., Soler, J., & Wanner, L. (2020, May). Toxic, hateful, offensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets. In Proceedings of the 12th language resources and evaluation conference (pp. 6786-6794).
22. Gomez, R., Gibert, J., Gomez, L., & Karatzas, D. (2020). Exploring hate speech identification in multimodal publications. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1470-1478).
23. Gröndahl, T., Pajola, L., Juuti, M., Conti, M., & Asokan, N. (2018, January). All you need is" love" evading hate speech identification. In Proceedings of the 11th ACM workshop on artificial intelligence and security (pp. 2-12).
24. Haq, N. U., Ullah, M., Khan, R., Ahmad, A., Almogren, A., Hayat, B., & Shafi, B. (2020). USAD: an intelligent system for slang and abusive text identification in PERSO-Arabic-scripted Urdu. Complexity, 2020, 1-7.
25. Hussain, S., Malik, M. S. I., & Masood, N. (2022). Identification of offensive language in Urdu using semantic and embedding models. PeerJ Computer Science, 8, e1169.
26. Ibrohim, M. O., & Budi, I. (2019, August). Multi-label hate speech and abusive language identification in Indonesian Twitter. In Proceedings of the third workshop on abusive language online (pp. 46-57).
27. Kapil, P., & Ekbal, A. (2020). A deep neural network based multi-task learning approach to hate speech identification. Knowledge-Based Systems, 210, 106458.
28. Karim, M. R., Dey, S. K., Islam, T., Shajalal, M., & Chakravarthi, B. R. (2022, November). Multimodal hate speech identification from bengali memes and texts. In International Conference on Speech and Language Technologies for Low-resource Languages (pp. 293-308). Cham: Springer International Publishing.
29. Khan, S., Kamal, A., Fazil, M., Alshara, M. A., Sejwal, V. K., Alotaibi, R. M., ... & Alqahtani, S. (2022). HCovBi-caps: hate speech identification using convolutional and Bi-directional gated recurrent unit with Capsule network. IEEE Access, 10, 7881-7894.
30. Lingiardi, V., Carone, N., Semeraro, G., Musto, C., D’Amico, M., & Brena, S. (2020). Mapping Twitter hate speech towards social and sexual minorities: A lexicon-based approach to semantic content analysis. Behaviour & Information Technology, 39(7), 711-721.
31. MacAvaney, S., Yao, H. R., Yang, E., Russell, K., Goharian, N., & Frieder, O. (2019). Hate speech identification: Challenges and solutions. PloS one, 14(8), e0221152.
32. Malik, J. S., Pang, G., & Hengel, A. V. D. (2022). Deep learning for hate speech identification: a comparative study. arXiv preprint arXiv:2202.09517.
33. Mandl, T., Modha, S., Kumar M, A., & Chakravarthi, B. R. (2020, December). Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german. In Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 29-32).
34. Mathew, B., Dutt, R., Goyal, P., & Mukherjee, A. (2019, June). Spread of hate speech in online social media. In Proceedings of the 10th ACM conference on web science (pp. 173-182).
35. Mehmood, K., Essam, D., Shafi, K., & Malik, M. K. (2020). An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Information Processing & Management, 57(6), 102368.
36. Mehta, H., & Passi, K. (2022). Social Media Hate Speech Identification Using Explainable Artificial Intelligence (XAI). Algorithms, 15(8), 291.
37. Mohtaj, S., Schmitt, V., & Möller, S. (2022). A Feature Extraction based Model for Hate Speech Identification. arXiv preprint arXiv:2201.04227.
38. Mossie, Z., & Wang, J. H. (2020). Vulnerable community identification using hate speech identification on social media. Information Processing & Management, 57(3), 102087.
39. Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). A BERT-based transfer learning approach for hate speech identification in online social media. In Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8 (pp. 928-940). Springer International Publishing.
40. Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2020). Arabic offensive language on twitter: Analysis and experiments. arXiv preprint arXiv:2004.02192.
41. Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., & Yeung, D. Y. (2019). Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049.
42. Parihar, A. S., Thapa, S., & Mishra, S. (2021, June). Hate speech identification using natural language processing: Applications and challenges. In 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 1302-1308). IEEE.
43. Pawar, A. B., Gawali, P., Gite, M., Jawale, M. A., & William, P. (2022, April). Challenges for hate speech recognition system: approach based on solution. In 2022 International conference on sustainable computing and data communication systems (ICSCDS) (pp. 699-704). IEEE.
44. Pereira-Kohatsu, J. C., Quijano-Sánchez, L., Liberatore, F., & Camacho-Collados, M. (2019). Identifying and monitoring hate speech in Twitter. Sensors, 19(21), 4654.
45. Pitenis, Z., Zampieri, M., & Ranasinghe, T. (2020). Offensive language identification in Greek. arXiv preprint arXiv:2003.07459.
46. Röttger, P., Seelawi, H., Nozza, D., Talat, Z., & Vidgen, B. (2022). MULTILINGUAL HATECHECK: Functional Tests for Multilingual Hate Speech Identification Models. arXiv preprint arXiv:2206.09917.
47. Roy, P. K., Tripathy, A. K., Das, T. K., & Gao, X. Z. (2020). A framework for hate speech identification using deep convolutional neural network. IEEE Access, 8, 204951-204962.
48. Ruwandika, N. D. T., & Weerasinghe, A. R. (2018, September). Identification of hate speech in social media. In 2018 18th international conference on advances in ICT for emerging regions (ICTer) (pp. 273-278). IEEE.
49. Saeed, H. H., Ashraf, M. H., Kamiran, F., Karim, A., & Calders, T. (2021). Roman Urdu toxic comment classification. Language Resources and Evaluation, 1-26.
50. Saleh, H., Alhothali, A., & Moria, K. (2023). Identification of hate speech using BERT and hate speech word embedding with deep model. Applied Artificial Intelligence, 37(1), 2166719.
51. Sandaruwan, H. M. S. T., Lorensuhewa, S. A. S., & Kalyani, M. A. L. (2019, September). Sinhala hate speech identification in social media using text mining and machine learning. In 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) (Vol. 250, pp. 1-8). IEEE.
52. Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019, July). The risk of racial bias in hate speech identification. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 1668-1678).
53. Satapara, S., Majumder, P., Mandl, T., Modha, S., Madhu, H., Ranasinghe, T., ... & Premasiri, D. (2022, December). Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 4-7).
54. Sharma, A., Kabra, A., & Jain, M. (2022). Ceasing hate with moh: Hate speech identification in hindi–english code-switched language. Information Processing & Management, 59(1), 102760.
55. Sutejo, T. L., & Lestari, D. P. (2018, November). Indonesia hate speech identification using deep learning. In 2018 International Conference on Asian Language Processing (IALP) (pp. 39-43). IEEE.
56. Teh, P. L., Cheng, C. B., & Chee, W. M. (2018, March). Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis (pp. 65-69).
57. Toraman, C., Şahinuç, F., & Yilmaz, E. H. (2022). Large-scale hate speech identification with cross-domain transfer. arXiv preprint arXiv:2203.01111.
58. Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence‐informed management knowledge by means of systematic review. British journal of management, 14(3), 207-222.
59. Ullah, F., Chen, X., Shah, S. B. H., Mahfoudh, S., Hassan, M. A., & Saeed, N. (2022). A novel approach for emotion identification and sentiment analysis for low resource Urdu language based on CNN-LSTM. Electronics, 11(24), 4096.
60. Velankar, A., Patil, H., & Joshi, R. (2022, November). Mono vs multilingual bert for hate speech identification and text classification: A case study in marathi. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 121-128). Cham: Springer International Publishing.
61. Wang, C. C., Day, M. Y., & Wu, C. L. (2022). Political Hate Speech Identification and Lexicon Building: A Study in Taiwan. IEEE Access, 10, 44337-44346.
62. William, P., Gade, R., esh Chaudhari, R., Pawar, A. B., & Jawale, M. A. (2022, April). Machine learning based automatic hate speech recognition system. In 2022 International conference on sustainable computing and data communication systems (ICSCDS) (pp. 315-318). IEEE.
63. Xia, M., Field, A., & Tsvetkov, Y. (2020). Demoting racial bias in hate speech identification. arXiv preprint arXiv:2005.12246.
64. Zhang, Z., &Luo, L. (2019). Hate speech identification: A solved problem? the challenging case of long tail on twitter. Semantic Web, 10(5), 925-945.
65. Zhou, Y., Yang, Y., Liu, H., Liu, X., & Savage, N. (2020). Deep learning based fusion approach for hate speech identification. IEEE Access, 8, 128923-128929.
66. Zimmerman, S., Kruschwitz, U., & Fox, C. (2018, May). Improving hate speech identification with deep learning ensembles. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).

Most read articles by the same author(s)