Table of Contents
Fetching ...

Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with Explanation

Krishanu Maity, Prince Jha, Raghav Jain, Sriparna Saha, Pushpak Bhattacharyya

TL;DR

This work tackles cyberbullying detection in code-mixed language and the need for explanations by introducing BullyExplain, a dataset annotated with bully labels, sentiment, targets, and rationale spans, and a multitask model, mExCB, that jointly learns Bully, Sentiment, Target, and Rationales with word- and sub-sentence-level attention. The model combines Bi-GRU and CNN encoders with self-attention, using pre-trained multilingual embeddings (mBERT or VecMap) and a shared representation with task-specific heads; rationales are generated via a sigmoid output and integrated into the bully decision through the loss function. Experimental results show that incorporating sentiment and rationales improves cyberbullying detection, with three-task variants often outperforming four-task setups, and mExCB surpasses state-of-the-art baselines on the Hindi-English code-mixed data. The work demonstrates the value of explainability for trust and transparency in AI-powered moderation and highlights future directions toward multimodal extensions and broader languages.

Abstract

Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps. While plenty of research is going on to develop better models for cyberbullying detection in monolingual language, there is very little research on the code-mixed languages and explainability aspect of cyberbullying. Recent laws like "right to explanations" of General Data Protection Regulation, have spurred research in developing interpretable models rather than focusing on performance. Motivated by this we develop the first interpretable multi-task model called {\em mExCB} for automatic cyberbullying detection from code-mixed languages which can simultaneously solve several tasks, cyberbullying detection, explanation/rationale identification, target group detection and sentiment analysis. We have introduced {\em BullyExplain}, the first benchmark dataset for explainable cyberbullying detection in code-mixed language. Each post in {\em BullyExplain} dataset is annotated with four labels, i.e., {\em bully label, sentiment label, target and rationales (explainability)}, i.e., which phrases are being responsible for annotating the post as a bully. The proposed multitask framework (mExCB) based on CNN and GRU with word and sub-sentence (SS) level attention is able to outperform several baselines and state of the art models when applied on {\em BullyExplain} dataset.

Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with Explanation

TL;DR

This work tackles cyberbullying detection in code-mixed language and the need for explanations by introducing BullyExplain, a dataset annotated with bully labels, sentiment, targets, and rationale spans, and a multitask model, mExCB, that jointly learns Bully, Sentiment, Target, and Rationales with word- and sub-sentence-level attention. The model combines Bi-GRU and CNN encoders with self-attention, using pre-trained multilingual embeddings (mBERT or VecMap) and a shared representation with task-specific heads; rationales are generated via a sigmoid output and integrated into the bully decision through the loss function. Experimental results show that incorporating sentiment and rationales improves cyberbullying detection, with three-task variants often outperforming four-task setups, and mExCB surpasses state-of-the-art baselines on the Hindi-English code-mixed data. The work demonstrates the value of explainability for trust and transparency in AI-powered moderation and highlights future directions toward multimodal extensions and broader languages.

Abstract

Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps. While plenty of research is going on to develop better models for cyberbullying detection in monolingual language, there is very little research on the code-mixed languages and explainability aspect of cyberbullying. Recent laws like "right to explanations" of General Data Protection Regulation, have spurred research in developing interpretable models rather than focusing on performance. Motivated by this we develop the first interpretable multi-task model called {\em mExCB} for automatic cyberbullying detection from code-mixed languages which can simultaneously solve several tasks, cyberbullying detection, explanation/rationale identification, target group detection and sentiment analysis. We have introduced {\em BullyExplain}, the first benchmark dataset for explainable cyberbullying detection in code-mixed language. Each post in {\em BullyExplain} dataset is annotated with four labels, i.e., {\em bully label, sentiment label, target and rationales (explainability)}, i.e., which phrases are being responsible for annotating the post as a bully. The proposed multitask framework (mExCB) based on CNN and GRU with word and sub-sentence (SS) level attention is able to outperform several baselines and state of the art models when applied on {\em BullyExplain} dataset.
Paper Structure (23 sections, 5 equations, 2 figures, 6 tables)

This paper contains 23 sections, 5 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Statistics of target class in our developed dataset.
  • Figure 2: Proposed Multitask Framework for Explainable Cyberbullying Detection, mExCB, architecture.