Table of Contents
Fetching ...

Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection

Jiawen Kang, Junan Li, Jinchao Li, Xixin Wu, Helen Meng

TL;DR

The study addresses how ASR errors affect BERT-based Alzheimer's disease detection from spontaneous speech. Using the ADReSS dataset, a TDNN ASR front-end is paired with a BERT↔PCA↔SVM detection pipeline, revealing a non-linear relationship between $WER$ and detection performance: high transcription errors can still yield near-top AD classification accuracy. Crucially, stopwords dominate ASR error counts but contribute minimally to discrimination, while task-related keywords—though a small fraction of errors—exert disproportionate influence on classification. These findings highlight the importance of preserving diagnostically relevant words and suggest avenues for ASR-robust AD detection systems that leverage keyword signals and semantic structure rather than bulk word accuracy. The work informs practical design strategies for scalable, speech-based AD screening by clarifying which error types matter most for downstream decision-making.

Abstract

Automatic Speech Recognition (ASR) plays an important role in speech-based automatic detection of Alzheimer's disease (AD). However, recognition errors could propagate downstream, potentially impacting the detection decisions. Recent studies have revealed a non-linear relationship between word error rates (WER) and AD detection performance, where ASR transcriptions with notable errors could still yield AD detection accuracy equivalent to that based on manual transcriptions. This work presents a series of analyses to explore the effect of ASR transcription errors in BERT-based AD detection systems. Our investigation reveals that not all ASR errors contribute equally to detection performance. Certain words, such as stopwords, despite constituting a large proportion of errors, are shown to play a limited role in distinguishing AD. In contrast, the keywords related to diagnosis tasks exhibit significantly greater importance relative to other words. These findings provide insights into the interplay between ASR errors and the downstream detection model.

Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection

TL;DR

The study addresses how ASR errors affect BERT-based Alzheimer's disease detection from spontaneous speech. Using the ADReSS dataset, a TDNN ASR front-end is paired with a BERT↔PCA↔SVM detection pipeline, revealing a non-linear relationship between and detection performance: high transcription errors can still yield near-top AD classification accuracy. Crucially, stopwords dominate ASR error counts but contribute minimally to discrimination, while task-related keywords—though a small fraction of errors—exert disproportionate influence on classification. These findings highlight the importance of preserving diagnostically relevant words and suggest avenues for ASR-robust AD detection systems that leverage keyword signals and semantic structure rather than bulk word accuracy. The work informs practical design strategies for scalable, speech-based AD screening by clarifying which error types matter most for downstream decision-making.

Abstract

Automatic Speech Recognition (ASR) plays an important role in speech-based automatic detection of Alzheimer's disease (AD). However, recognition errors could propagate downstream, potentially impacting the detection decisions. Recent studies have revealed a non-linear relationship between word error rates (WER) and AD detection performance, where ASR transcriptions with notable errors could still yield AD detection accuracy equivalent to that based on manual transcriptions. This work presents a series of analyses to explore the effect of ASR transcription errors in BERT-based AD detection systems. Our investigation reveals that not all ASR errors contribute equally to detection performance. Certain words, such as stopwords, despite constituting a large proportion of errors, are shown to play a limited role in distinguishing AD. In contrast, the keywords related to diagnosis tasks exhibit significantly greater importance relative to other words. These findings provide insights into the interplay between ASR errors and the downstream detection model.

Paper Structure

This paper contains 12 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The top 20 error word distribution of the ASR system.
  • Figure 2: Samples of alignment map for participants s170 (a) and s179 (b). Squares represent correctly transcribed words, while crosses ('x') indicate ASR errors. Blue color for stopwords, red for keywords, and gray for other words.
  • Figure 3: BERT embedding variations as stopwords are incrementally removed/substituted from manual transcriptions (participant s172). Light to dark blue: fewer to more stopwords removed/substituted. 'x': ASR transcript. White/gray: health/AD decision regions.
  • Figure 4: Average hyperplane offset for transcription embeddings as stopwords (a) and keywords (b) are incrementally removed or substituted. Edit ratio represents the percentage of words removed/substituted.
  • Figure 5: BERT embedding variations as keywords are incrementally removed/substituted from manual transcriptions (participant s172). Light to dark blue: fewer to more keywords removed/substituted. 'x': ASR transcript. White/gray: health/AD decision regions.