Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang
TL;DR
This work tackles data scarcity in Alzheimer's disease detection from spontaneous speech by transferring knowledge from large speech foundation models (speech-generic) and depression-specific signals. It analyzes intermediate layers of multiple foundation models to identify informative representations and proposes a parallel knowledge-transfer framework that jointly learns AD and depression tasks with a shared encoder and task-specific heads. A joint loss, $L = L_{AD} + \lambda L_{Dep}$ with $\lambda = 0.1$, enables cross-domain learning while preserving task-specific information, yielding improvements over single-task baselines and achieving a state-of-the-art F1-score of $0.928$ on the ADReSSo dataset. The results substantiate the connection between AD and depression and demonstrate practical impact for scalable, speech-based screening in resource-constrained settings.
Abstract
The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.
