Table of Contents
Fetching ...

A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media

Edward Ajayi, Martha Kachweka, Mawuli Deku, Emily Aiken

TL;DR

The paper tackles the challenge of detecting a broad set of mental health conditions and cyberbullying signals from social media by proposing a unified multiclass framework trained on Reddit and Twitter data. It shows that end-to-end fine-tuning of transformer models, particularly MentalBERT, yields the best overall performance, with high accuracy and robust Macro F1, while demonstrating the necessity of a split-then-balance data pipeline for realistic evaluation. The work advances practical screening tools for moderators by introducing a SHAP-LLM hybrid explainability system and a prototype dashboard, the Social Media Screener, to integrate predictions and explanations into workflows. It also highlights important ethical considerations, limitations, and future directions, including multi-label and multilingual extensions to better reflect real-world use cases in online safety and computational mental health.

Abstract

Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous "split-then-balance" pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conducted a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a zero-shot LLM baseline. Grounded in a comprehensive ethical analysis, we frame the system as a human-in-the-loop screening aid, not a diagnostic tool. To support this, we introduce a hybrid SHAPLLM explainability framework and present a prototype dashboard ("Social Media Screener") designed to integrate model predictions and their explanations into a practical workflow for moderators. Our work provides a robust baseline, highlighting future needs for multi-label, clinically-validated datasets at the critical intersection of online safety and computational mental health.

A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media

TL;DR

The paper tackles the challenge of detecting a broad set of mental health conditions and cyberbullying signals from social media by proposing a unified multiclass framework trained on Reddit and Twitter data. It shows that end-to-end fine-tuning of transformer models, particularly MentalBERT, yields the best overall performance, with high accuracy and robust Macro F1, while demonstrating the necessity of a split-then-balance data pipeline for realistic evaluation. The work advances practical screening tools for moderators by introducing a SHAP-LLM hybrid explainability system and a prototype dashboard, the Social Media Screener, to integrate predictions and explanations into workflows. It also highlights important ethical considerations, limitations, and future directions, including multi-label and multilingual extensions to better reflect real-world use cases in online safety and computational mental health.

Abstract

Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous "split-then-balance" pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conducted a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a zero-shot LLM baseline. Grounded in a comprehensive ethical analysis, we frame the system as a human-in-the-loop screening aid, not a diagnostic tool. To support this, we introduce a hybrid SHAPLLM explainability framework and present a prototype dashboard ("Social Media Screener") designed to integrate model predictions and their explanations into a practical workflow for moderators. Our work provides a robust baseline, highlighting future needs for multi-label, clinically-validated datasets at the critical intersection of online safety and computational mental health.

Paper Structure

This paper contains 45 sections, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Table showing top 10 TF-IDF words for each class label
  • Figure 2: Plot of correlation of TF-IDF embeddings across different class labels
  • Figure 3: Precision-Recall Curve for the 'Suicide' class, showing near-perfect AUPRC scores for all fine-tuned models.
  • Figure 4: Calibration Curve for the 'Suicide' class. MentalBERT demonstrates the best calibration, with its predicted probabilities closely matching the observed frequencies.
  • Figure 5: Confusion matrix for Fine-tuned MentalBERT. The model's primary errors occur between semantically related classes (e.g., Anxiety and Stress), confirming that it learns content over platform-specific artifacts.
  • ...and 6 more figures