Self-Supervised Embeddings for Detecting Individual Symptoms of Depression
Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore
TL;DR
This paper tackles symptom-level detection of depression and overall severity prediction from speech using self-supervised speech embeddings to address data scarcity in clinical settings. It evaluates a range of SSL models with distinct pretraining objectives to determine which linguistic or paralinguistic information—semantic, speaker, or prosodic—most benefits identifying MADRS symptoms, and it tests both single-model and multi-model fusion under a multi-task framework. The findings show SSL embeddings consistently outperform conventional features, with semantic-focused models excelling on sadness-related symptoms and speaker/prosodic models aiding other symptom categories; combining multiple SSL embeddings yields additional gains, and multi-task learning is both efficient and effective. These results support SSL-based, symptom-aware depression assessment as a scalable approach for clinically grounded, speech-based mental health screening and monitoring.
Abstract
Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better utilize the small-sized datasets that are frequently encountered in this task. Our study demonstrates notable performance improvements by utilizing SSL embeddings compared to conventional speech features. We compare various types of SSL pretrained models to elucidate the type of speech information (semantic, speaker, or prosodic) that contributes the most in identifying different symptoms. Additionally, we evaluate the impact of combining multiple SSL embeddings on performance. Furthermore, we show the significance of multi-task learning for identifying depressive symptoms effectively.
