Detection of offensive language in the Moroccan dialect using BERT-based models
Moussaoui Otman1, Yacine El Younoussi2 and Naoufal Rtayli3
Professor, Department of Computer Engineering,ENSA of Tetouan, Abdelmalek Essaadi University, Tetouan,Morocco2
Professor, Department of Computer Engineering,Faculty of Sciences Tetouan, Abdelmalek Essaadi University, Tetouan,Morocco3
Corresponding Author : Moussaoui Otman
Recieved : 14-September-2024; Revised : 25-June-2025; Accepted : 04-July-2025
Abstract
The detection of offensive language and hate speech in online communication has become increasingly important due to the rapid spread of harmful content on social media. This challenge is especially significant for low-resource languages such as the Moroccan dialect. This study addresses the need for effective automated systems to detect offensive language, rudeness, hate speech, and toxicity in the Moroccan dialect. Six transformer-based models were fine-tuned and evaluated for this task: darija BERT mix (DBERT-mix), MARBERT, multilingual BERT (mBERT), Moroccan BERT (MorrBERT), cross-lingual language model - RoBERTa (XLM-R), and Moroccan RoBERTa (MorRoBERTa). The results show that DBERT-mix achieved the highest performance, outperforming the other models. Furthermore, the analysis indicated better performance on Latin script compared to Arabic script, highlighting the need for further optimization of models for Arabic script. These findings highlight the importance of adapting models to specific dialects and scripts, providing valuable insights for improving offensive language detection in the Moroccan context.
Keywords
Offensive language detection, Hate speech classification, Moroccan dialect, Transformer models, BERT-based models, Script-specific NLP (Latin and Arabic).
Cite this article
Otman M, Younoussi YE, Rtayli N. Detection of offensive language in the Moroccan dialect using BERT-based models. International Journal of Advanced Technology and Engineering Exploration. 2025;12(128):1075-1085. DOI : 10.19101/IJATEE.2024.111101679
