Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

S. S. Saruar; Nusrat; Sadia

Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

S. S. Saruar, Nusrat, Sadia

TL;DR

This work targets Bengali racist text detection on social media by building a dedicated Bengali dataset and applying an ensemble deep-learning framework. It combines Bi-RNN, Bi-LSTM, and a Multi-Channel CNN-LSTM (MCNN-LSTM) with three Bengali BERT-based embeddings (BanglaBERT, BanglaBERT Base, SahajBERT), achieving a peak accuracy of 87.94% on a four-class setup with implicit binary Racism detection. The study demonstrates that MCNN-LSTM often yields the strongest single-model performance and that ensemble averaging provides a modest uplift, particularly with SahajBERT embeddings. The contribution offers a targeted, scalable approach for online moderation in Bengali and highlights the need for larger, more balanced datasets to enable finer-grained multiclass classification.

Abstract

Racism is an alarming phenomenon in our country as well as all over the world. Every day we have come across some racist comments in our daily life and virtual life. Though we can eradicate this racism from virtual life (such as Social Media). In this paper, we have tried to detect those racist comments with NLP and deep learning techniques. We have built a novel dataset in the Bengali Language. Further, we annotated the dataset and conducted data label validation. After extensive utilization of deep learning methodologies, we have successfully achieved text detection with an impressive accuracy rate of 87.94\% using the Ensemble approach. We have applied RNN and LSTM models using BERT Embeddings. However, the MCNN-LSTM model performed highest among all those models. Lastly, the Ensemble approach has been followed to combine all the model results to increase overall performance.

Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

TL;DR

Abstract

Paper Structure (20 sections, 6 figures, 8 tables)

This paper contains 20 sections, 6 figures, 8 tables.

Introduction
Related Works
Definition of the task
Data Acquisition
Data Collection
Annotation Process
Data Preprocessing
Data Distribution
Methodology
Feature Extraction:
Model Architecture:
Fine Tuned Hyper-parameters:
Ensemble Approach
Experimental Results
Experiments
...and 5 more sections

Figures (6)

Figure 1: Data Survey Report
Figure 2: Dataset construction process
Figure 3: A schematic diagram of our work.
Figure 4: Architecture of our proposed MCNN-LSTM.
Figure 5: Confusion matrix of Bi-RNN, Bi-LSTM, MCNN-LSTM, and Ensemble Models with the embeddings from Sahaj BERT.(left-right)
...and 1 more figures

Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

TL;DR

Abstract

Detecting Racist Text in Bengali: An Ensemble Deep Learning Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (6)