Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection
Jiawen Kang, Junan Li, Jinchao Li, Xixin Wu, Helen Meng
TL;DR
The study addresses how ASR errors affect BERT-based Alzheimer's disease detection from spontaneous speech. Using the ADReSS dataset, a TDNN ASR front-end is paired with a BERT↔PCA↔SVM detection pipeline, revealing a non-linear relationship between $WER$ and detection performance: high transcription errors can still yield near-top AD classification accuracy. Crucially, stopwords dominate ASR error counts but contribute minimally to discrimination, while task-related keywords—though a small fraction of errors—exert disproportionate influence on classification. These findings highlight the importance of preserving diagnostically relevant words and suggest avenues for ASR-robust AD detection systems that leverage keyword signals and semantic structure rather than bulk word accuracy. The work informs practical design strategies for scalable, speech-based AD screening by clarifying which error types matter most for downstream decision-making.
Abstract
Automatic Speech Recognition (ASR) plays an important role in speech-based automatic detection of Alzheimer's disease (AD). However, recognition errors could propagate downstream, potentially impacting the detection decisions. Recent studies have revealed a non-linear relationship between word error rates (WER) and AD detection performance, where ASR transcriptions with notable errors could still yield AD detection accuracy equivalent to that based on manual transcriptions. This work presents a series of analyses to explore the effect of ASR transcription errors in BERT-based AD detection systems. Our investigation reveals that not all ASR errors contribute equally to detection performance. Certain words, such as stopwords, despite constituting a large proportion of errors, are shown to play a limited role in distinguishing AD. In contrast, the keywords related to diagnosis tasks exhibit significantly greater importance relative to other words. These findings provide insights into the interplay between ASR errors and the downstream detection model.
