multiMentalRoBERTa: A Fine-tuned Multiclass Classifier for Mental Health Disorder
K M Sajjadul Islam, John Fields, Praveen Madiraju
TL;DR
multiMentalRoBERTa tackles multiclass mental health detection from social media text by fine-tuning RoBERTa for six-class classification (including stress and neutral discourse). It combines diverse data sources with embedding-based exploration, demonstrating superior macro-F1 performance ($0.839$ six-class; $0.870$ five-class) over baselines and prompting LLMs. Explainability analyses using Layer Integrated Gradients and KeyBERT reveal clinically meaningful cues for depression and suicidal ideation, while addressing label noise and safety through bias mitigation and human-in-the-loop safeguards. The approach is lightweight, readily deployable for peer-support platforms and social media monitoring, with future work extending clinical evaluation and integration into triage tools.
Abstract
The early detection of mental health disorders from social media text is critical for enabling timely support, risk assessment, and referral to appropriate resources. This work introduces multiMentalRoBERTa, a fine-tuned RoBERTa model designed for multiclass classification of common mental health conditions, including stress, anxiety, depression, post-traumatic stress disorder (PTSD), suicidal ideation, and neutral discourse. Drawing on multiple curated datasets, data exploration is conducted to analyze class overlaps, revealing strong correlations between depression and suicidal ideation as well as anxiety and PTSD, while stress emerges as a broad, overlapping category. Comparative experiments with traditional machine learning methods, domain-specific transformers, and prompting-based large language models demonstrate that multiMentalRoBERTa achieves superior performance, with macro F1-scores of 0.839 in the six-class setup and 0.870 in the five-class setup (excluding stress), outperforming both fine-tuned MentalBERT and baseline classifiers. Beyond predictive accuracy, explainability methods, including Layer Integrated Gradients and KeyBERT, are applied to identify lexical cues that drive classification, with a particular focus on distinguishing depression from suicidal ideation. The findings emphasize the effectiveness of fine-tuned transformers for reliable and interpretable detection in sensitive contexts, while also underscoring the importance of fairness, bias mitigation, and human-in-the-loop safety protocols. Overall, multiMentalRoBERTa is presented as a lightweight, robust, and deployable solution for enhancing support in mental health platforms.
