Table of Contents
Fetching ...

Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore

TL;DR

This paper tackles symptom-level detection of depression and overall severity prediction from speech using self-supervised speech embeddings to address data scarcity in clinical settings. It evaluates a range of SSL models with distinct pretraining objectives to determine which linguistic or paralinguistic information—semantic, speaker, or prosodic—most benefits identifying MADRS symptoms, and it tests both single-model and multi-model fusion under a multi-task framework. The findings show SSL embeddings consistently outperform conventional features, with semantic-focused models excelling on sadness-related symptoms and speaker/prosodic models aiding other symptom categories; combining multiple SSL embeddings yields additional gains, and multi-task learning is both efficient and effective. These results support SSL-based, symptom-aware depression assessment as a scalable approach for clinically grounded, speech-based mental health screening and monitoring.

Abstract

Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better utilize the small-sized datasets that are frequently encountered in this task. Our study demonstrates notable performance improvements by utilizing SSL embeddings compared to conventional speech features. We compare various types of SSL pretrained models to elucidate the type of speech information (semantic, speaker, or prosodic) that contributes the most in identifying different symptoms. Additionally, we evaluate the impact of combining multiple SSL embeddings on performance. Furthermore, we show the significance of multi-task learning for identifying depressive symptoms effectively.

Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

TL;DR

This paper tackles symptom-level detection of depression and overall severity prediction from speech using self-supervised speech embeddings to address data scarcity in clinical settings. It evaluates a range of SSL models with distinct pretraining objectives to determine which linguistic or paralinguistic information—semantic, speaker, or prosodic—most benefits identifying MADRS symptoms, and it tests both single-model and multi-model fusion under a multi-task framework. The findings show SSL embeddings consistently outperform conventional features, with semantic-focused models excelling on sadness-related symptoms and speaker/prosodic models aiding other symptom categories; combining multiple SSL embeddings yields additional gains, and multi-task learning is both efficient and effective. These results support SSL-based, symptom-aware depression assessment as a scalable approach for clinically grounded, speech-based mental health screening and monitoring.

Abstract

Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better utilize the small-sized datasets that are frequently encountered in this task. Our study demonstrates notable performance improvements by utilizing SSL embeddings compared to conventional speech features. We compare various types of SSL pretrained models to elucidate the type of speech information (semantic, speaker, or prosodic) that contributes the most in identifying different symptoms. Additionally, we evaluate the impact of combining multiple SSL embeddings on performance. Furthermore, we show the significance of multi-task learning for identifying depressive symptoms effectively.

Paper Structure

This paper contains 8 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Schematic diagram of the symptom detection model using (a) single and (b) multiple SSL models (N models). In this work, N = 2 or 3. Multi-task setting will have 10 symptom detection heads in the output layer ($S_i$ for $1 \leq i \leq 10$) along with one regression (Reg) head. In single-task setting, there will be only one head (specific-symptom detection or regression).
  • Figure 2: Distribution of samples between class-0 (symptom absent) and class-1 (symptom present) for each symptom in the MADRS. Symptom abbreviations are provided in Table \ref{['tab:madrs_symptoms']}.