Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

Ziyun Cui; Wen Wu; Wei-Qiang Zhang; Ji Wu; Chao Zhang

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang

TL;DR

This work tackles data scarcity in Alzheimer's disease detection from spontaneous speech by transferring knowledge from large speech foundation models (speech-generic) and depression-specific signals. It analyzes intermediate layers of multiple foundation models to identify informative representations and proposes a parallel knowledge-transfer framework that jointly learns AD and depression tasks with a shared encoder and task-specific heads. A joint loss, $L = L_{AD} + \lambda L_{Dep}$ with $\lambda = 0.1$, enables cross-domain learning while preserving task-specific information, yielding improvements over single-task baselines and achieving a state-of-the-art F1-score of $0.928$ on the ADReSSo dataset. The results substantiate the connection between AD and depression and demonstrate practical impact for scalable, speech-based screening in resource-constrained settings.

Abstract

The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

TL;DR

with

, enables cross-domain learning while preserving task-specific information, yielding improvements over single-task baselines and achieving a state-of-the-art F1-score of

on the ADReSSo dataset. The results substantiate the connection between AD and depression and demonstrate practical impact for scalable, speech-based screening in resource-constrained settings.

Abstract

Paper Structure (18 sections, 4 figures, 5 tables)

This paper contains 18 sections, 4 figures, 5 tables.

Introduction
Related work
Proposed method
System structure
Speech foundation model
ASR system and text foundation model
Downstream AD detection block
Speech-generic knowledge transfer
Depression-specific knowledge transfer
Experimental setup
Datasets
Data augmentation
Implementation details
Experimental Results of Transferring Speech-generic knowledge
Block-wise analysis of speech foundation models
...and 3 more sections

Figures (4)

Figure 1: Overall structure of the proposed system.
Figure 2: The structure of block-wise analysis of speech foundation models.
Figure 3: The structure of the downstream model for knowledge transfer learning.
Figure 4: Summary of block-wise analysis of AD diagnosis.

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

TL;DR

Abstract

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)