A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media
Edward Ajayi, Martha Kachweka, Mawuli Deku, Emily Aiken
TL;DR
The paper tackles the challenge of detecting a broad set of mental health conditions and cyberbullying signals from social media by proposing a unified multiclass framework trained on Reddit and Twitter data. It shows that end-to-end fine-tuning of transformer models, particularly MentalBERT, yields the best overall performance, with high accuracy and robust Macro F1, while demonstrating the necessity of a split-then-balance data pipeline for realistic evaluation. The work advances practical screening tools for moderators by introducing a SHAP-LLM hybrid explainability system and a prototype dashboard, the Social Media Screener, to integrate predictions and explanations into workflows. It also highlights important ethical considerations, limitations, and future directions, including multi-label and multilingual extensions to better reflect real-world use cases in online safety and computational mental health.
Abstract
Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous "split-then-balance" pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conducted a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a zero-shot LLM baseline. Grounded in a comprehensive ethical analysis, we frame the system as a human-in-the-loop screening aid, not a diagnostic tool. To support this, we introduce a hybrid SHAPLLM explainability framework and present a prototype dashboard ("Social Media Screener") designed to integrate model predictions and their explanations into a practical workflow for moderators. Our work provides a robust baseline, highlighting future needs for multi-label, clinically-validated datasets at the critical intersection of online safety and computational mental health.
