Efficient or Powerful? Trade-offs Between Machine Learning and Deep Learning for Mental Illness Detection on Social Media
Zhanyi Ding, Zhongyan Wang, Yeyubei Zhang, Yuchen Cao, Yunchong Liu, Xiaorui Shen, Yexin Tian, Jianglai Dai
TL;DR
This work compares machine learning and deep learning approaches for detecting mental illness signals in social-media text, addressing accuracy, interpretability, and computational efficiency. It evaluates LR, SVM, RF, and LightGBM alongside ALBERT and GRU on binary and multiclass tasks using a Kaggle dataset and metrics such as $F1$ and AUROC. Findings show comparable performance across models on medium-sized data, with ML methods offering clearer interpretability and faster training, while DL models can better capture complex linguistic patterns at higher computational cost. The study provides empirical guidance for method selection based on dataset size, interpretability needs, and available computing resources, and highlights data labeling and ethical considerations for real-world deployment.
Abstract
Social media platforms provide valuable insights into mental health trends by capturing user-generated discussions on conditions such as depression, anxiety, and suicidal ideation. Machine learning (ML) and deep learning (DL) models have been increasingly applied to classify mental health conditions from textual data, but selecting the most effective model involves trade-offs in accuracy, interpretability, and computational efficiency. This study evaluates multiple ML models, including logistic regression, random forest, and LightGBM, alongside deep learning architectures such as ALBERT and Gated Recurrent Units (GRUs), for both binary and multi-class classification of mental health conditions. Our findings indicate that ML and DL models achieve comparable classification performance on medium-sized datasets, with ML models offering greater interpretability through variable importance scores, while DL models are more robust to complex linguistic patterns. Additionally, ML models require explicit feature engineering, whereas DL models learn hierarchical representations directly from text. Logistic regression provides the advantage of capturing both positive and negative associations between features and mental health conditions, whereas tree-based models prioritize decision-making power through split-based feature selection. This study offers empirical insights into the advantages and limitations of different modeling approaches and provides recommendations for selecting appropriate methods based on dataset size, interpretability needs, and computational constraints.
