Table of Contents
Fetching ...

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang

TL;DR

This work tackles data scarcity in Alzheimer's disease detection from spontaneous speech by transferring knowledge from large speech foundation models (speech-generic) and depression-specific signals. It analyzes intermediate layers of multiple foundation models to identify informative representations and proposes a parallel knowledge-transfer framework that jointly learns AD and depression tasks with a shared encoder and task-specific heads. A joint loss, $L = L_{AD} + \lambda L_{Dep}$ with $\lambda = 0.1$, enables cross-domain learning while preserving task-specific information, yielding improvements over single-task baselines and achieving a state-of-the-art F1-score of $0.928$ on the ADReSSo dataset. The results substantiate the connection between AD and depression and demonstrate practical impact for scalable, speech-based screening in resource-constrained settings.

Abstract

The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

TL;DR

This work tackles data scarcity in Alzheimer's disease detection from spontaneous speech by transferring knowledge from large speech foundation models (speech-generic) and depression-specific signals. It analyzes intermediate layers of multiple foundation models to identify informative representations and proposes a parallel knowledge-transfer framework that jointly learns AD and depression tasks with a shared encoder and task-specific heads. A joint loss, with , enables cross-domain learning while preserving task-specific information, yielding improvements over single-task baselines and achieving a state-of-the-art F1-score of on the ADReSSo dataset. The results substantiate the connection between AD and depression and demonstrate practical impact for scalable, speech-based screening in resource-constrained settings.

Abstract

The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.
Paper Structure (18 sections, 4 figures, 5 tables)

This paper contains 18 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overall structure of the proposed system.
  • Figure 2: The structure of block-wise analysis of speech foundation models.
  • Figure 3: The structure of the downstream model for knowledge transfer learning.
  • Figure 4: Summary of block-wise analysis of AD diagnosis.