Table of Contents
Fetching ...

AI Enabled User-Specific Cyberbullying Severity Detection with Explainability

Tabia Tanzin Prama, Jannatul Ferdaws Amrin, Md. Mushfique Anwar, Iqbal H. Sarker

TL;DR

Cyberbullying severity detection is enhanced by integrating victim-specific psychological, demographic, and behavioral attributes with social-media comments using an LSTM model trained on 146 features. A re-labeling scheme classifies comments into Not Bullying, Mild Bullying, and Severe Bullying, while SHAP and LIME provide both global and local explanations of model decisions. The approach achieves state-of-the-art performance (accuracy $98\%$, F1 $0.97$) and reveals that demographic and behavioral factors contribute to severity, emphasizing the value of explainability for moderation decisions. The study demonstrates practical implications for safer online environments and proposes future work including browser-based interventions and multimodal content analysis.

Abstract

The rise of social media has significantly increased the prevalence of cyberbullying (CB), posing serious risks to both mental and physical well-being. Effective detection systems are essential for mitigating its impact. While several machine learning (ML) models have been developed, few incorporate victims' psychological, demographic, and behavioral factors alongside bullying comments to assess severity. In this study, we propose an AI model intregrating user-specific attributes, including psychological factors (self-esteem, anxiety, depression), online behavior (internet usage, disciplinary history), and demographic attributes (race, gender, ethnicity), along with social media comments. Additionally, we introduce a re-labeling technique that categorizes social media comments into three severity levels: Not Bullying, Mild Bullying, and Severe Bullying, considering user-specific factors.Our LSTM model is trained using 146 features, incorporating emotional, topical, and word2vec representations of social media comments as well as user-level attributes and it outperforms existing baseline models, achieving the highest accuracy of 98\% and an F1-score of 0.97. To identify key factors influencing the severity of cyberbullying, we employ explainable AI techniques (SHAP and LIME) to interpret the model's decision-making process. Our findings reveal that, beyond hate comments, victims belonging to specific racial and gender groups are more frequently targeted and exhibit higher incidences of depression, disciplinary issues, and low self-esteem. Additionally, individuals with a prior history of bullying are at a greater risk of becoming victims of cyberbullying.

AI Enabled User-Specific Cyberbullying Severity Detection with Explainability

TL;DR

Cyberbullying severity detection is enhanced by integrating victim-specific psychological, demographic, and behavioral attributes with social-media comments using an LSTM model trained on 146 features. A re-labeling scheme classifies comments into Not Bullying, Mild Bullying, and Severe Bullying, while SHAP and LIME provide both global and local explanations of model decisions. The approach achieves state-of-the-art performance (accuracy , F1 ) and reveals that demographic and behavioral factors contribute to severity, emphasizing the value of explainability for moderation decisions. The study demonstrates practical implications for safer online environments and proposes future work including browser-based interventions and multimodal content analysis.

Abstract

The rise of social media has significantly increased the prevalence of cyberbullying (CB), posing serious risks to both mental and physical well-being. Effective detection systems are essential for mitigating its impact. While several machine learning (ML) models have been developed, few incorporate victims' psychological, demographic, and behavioral factors alongside bullying comments to assess severity. In this study, we propose an AI model intregrating user-specific attributes, including psychological factors (self-esteem, anxiety, depression), online behavior (internet usage, disciplinary history), and demographic attributes (race, gender, ethnicity), along with social media comments. Additionally, we introduce a re-labeling technique that categorizes social media comments into three severity levels: Not Bullying, Mild Bullying, and Severe Bullying, considering user-specific factors.Our LSTM model is trained using 146 features, incorporating emotional, topical, and word2vec representations of social media comments as well as user-level attributes and it outperforms existing baseline models, achieving the highest accuracy of 98\% and an F1-score of 0.97. To identify key factors influencing the severity of cyberbullying, we employ explainable AI techniques (SHAP and LIME) to interpret the model's decision-making process. Our findings reveal that, beyond hate comments, victims belonging to specific racial and gender groups are more frequently targeted and exhibit higher incidences of depression, disciplinary issues, and low self-esteem. Additionally, individuals with a prior history of bullying are at a greater risk of becoming victims of cyberbullying.

Paper Structure

This paper contains 26 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of Proposed Workflow to determine intensity of bullying and classify cyberbullying instances into "Not bullying", "Mild", and "Severe" categories
  • Figure 2: Data preprocessing using NLTK libraries.
  • Figure 3: Word cloud of frequently used in bullying comments.
  • Figure 4: Architecture of the proposed LSTM model for cyber bullying severity prediction
  • Figure 5: LIME explanations for Table \ref{['LIME1']} showing important words contributing to the model's decision.
  • ...and 1 more figures