Integrating Pause Information with Word Embeddings in Language Models for Alzheimer's Disease Detection from Spontaneous Speech
Yu Pu, Wei-Qiang Zhang
TL;DR
This work tackles early Alzheimer's disease detection from spontaneous speech by leveraging pauses as temporal cues. It introduces a pause-embedding mechanism that encodes word durations and inter-word pauses, and integrates these embeddings into a BERT-based language model. A two-stage training regime uses GigaSpeech for pretraining the pause representations and ADReSSo for task-specific fine-tuning, achieving a top-1 accuracy of 83.1% on ADReSSo. The results demonstrate that pause information is a valuable non-invasive biomarker for AD and suggests broader applicability to neurodegenerative disease detection.
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by cognitive decline and memory loss. Early detection of AD is crucial for effective intervention and treatment. In this paper, we propose a novel approach to AD detection from spontaneous speech, which incorporates pause information into language models. Our method involves encoding pause information into embeddings and integrating them into the typical transformer-based language model, enabling it to capture both semantic and temporal features of speech data. We conduct experiments on the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) dataset and its extension, the ADReSSo dataset, comparing our method with existing approaches. Our method achieves an accuracy of 83.1% in the ADReSSo test set. The results demonstrate the effectiveness of our approach in discriminating between AD patients and healthy individuals, highlighting the potential of pauses as a valuable indicator for AD detection. By leveraging speech analysis as a non-invasive and cost-effective tool for AD detection, our research contributes to early diagnosis and improved management of this debilitating disease.
