Table of Contents
Fetching ...

IHearYou: Linking Acoustic Features to DSM-5 Depressive Behavior Indicators

Jonas Länzlinger, Katharina Müller, Burkhard Stiller, Bruno Rodrigues

TL;DR

This work addresses the need for objective, interpretable, and privacy‑preserving depression assessment by linking acoustic speech features to DSM‑5 depressive indicators using a DSM‑5–aligned, on‑device Linkage Framework. The IHearYou system maps low‑level voice metrics to clinically meaningful indicators through transparent, testable rules, enabling explainable DSM‑5 scores without cloud processing. It demonstrates reproducible results on the DAIC‑WOZ dataset with a configuration‑driven protocol (including FDR control and gender stratification) and validates end‑to‑end feasibility via TESS streaming, all on commodity hardware. While directional feature–indicator associations emerge, the study notes limitations from sample size and calls for larger longitudinal, multimodal deployments to enhance robustness and clinical utility, while preserving edge privacy and interpretability.

Abstract

Depression affects over millions people worldwide, yet diagnosis still relies on subjective self-reports and interviews that may not capture authentic behavior. We present IHearYou, an approach to automated depression detection focused on speech acoustics. Using passive sensing in household environments, IHearYou extracts voice features and links them to DSM-5 (Diagnostic and Statistical Manual of Mental Disorders) indicators through a structured Linkage Framework instantiated for Major Depressive Disorder. The system runs locally to preserve privacy and includes a persistence schema and dashboard, presenting real-time throughput on a commodity laptop. To ensure reproducibility, we define a configuration-driven protocol with False Discovery Rate (FDR) correction and gender-stratified testing. Applied to the DAIC-WOZ dataset, this protocol reveals directionally consistent feature-indicator associations, while a TESS-based audio streaming experiment validates end-to-end feasibility. Our results show how passive voice sensing can be turned into explainable DSM-5 indicator scores, bridging the gap between black-box detection and clinically interpretable, on-device analysis.

IHearYou: Linking Acoustic Features to DSM-5 Depressive Behavior Indicators

TL;DR

This work addresses the need for objective, interpretable, and privacy‑preserving depression assessment by linking acoustic speech features to DSM‑5 depressive indicators using a DSM‑5–aligned, on‑device Linkage Framework. The IHearYou system maps low‑level voice metrics to clinically meaningful indicators through transparent, testable rules, enabling explainable DSM‑5 scores without cloud processing. It demonstrates reproducible results on the DAIC‑WOZ dataset with a configuration‑driven protocol (including FDR control and gender stratification) and validates end‑to‑end feasibility via TESS streaming, all on commodity hardware. While directional feature–indicator associations emerge, the study notes limitations from sample size and calls for larger longitudinal, multimodal deployments to enhance robustness and clinical utility, while preserving edge privacy and interpretability.

Abstract

Depression affects over millions people worldwide, yet diagnosis still relies on subjective self-reports and interviews that may not capture authentic behavior. We present IHearYou, an approach to automated depression detection focused on speech acoustics. Using passive sensing in household environments, IHearYou extracts voice features and links them to DSM-5 (Diagnostic and Statistical Manual of Mental Disorders) indicators through a structured Linkage Framework instantiated for Major Depressive Disorder. The system runs locally to preserve privacy and includes a persistence schema and dashboard, presenting real-time throughput on a commodity laptop. To ensure reproducibility, we define a configuration-driven protocol with False Discovery Rate (FDR) correction and gender-stratified testing. Applied to the DAIC-WOZ dataset, this protocol reveals directionally consistent feature-indicator associations, while a TESS-based audio streaming experiment validates end-to-end feasibility. Our results show how passive voice sensing can be turned into explainable DSM-5 indicator scores, bridging the gap between black-box detection and clinically interpretable, on-device analysis.

Paper Structure

This paper contains 19 sections, 6 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: High‑level system architecture.
  • Figure 2: Persistence layer overview: database collections model
  • Figure 3: Linking feature intersections – matrix plot.
  • Figure 4: Distribution of selected male and female acoustic voice features.
  • Figure 5: Cohen's d (positive = higher in males, negative = higher in females).
  • ...and 6 more figures