Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?
Changye Li, Weizhe Xu, Trevor Cohen, Serguei Pakhomov
TL;DR
The paper investigates whether automatic speech recognition errors can enhance downstream dementia classification using the Cookie Theft task. By comparing pre-trained and domain-adapted ASR models with beam-search decoding against manual transcripts, and by applying a BERT classifier to ASR-derived transcripts, the study reveals a counterintuitive finding: imperfect transcripts often yield higher classification accuracy and AUC than verbatim transcripts. SHAP-based error analysis and content-unit evaluations show that systematic ASR errors capture linguistically and acoustically informative cues related to dementia, while interpretability improves through transcript-level explanations. The results highlight a practical synergy between ASR and classification models, suggesting that carefully designed ASR pipelines could support scalable cognitive impairment screening while outlining limitations related to data quality and generalizability.
Abstract
\textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the ``Cookie Theft'' picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those with Alzheimer's disease (AD). \textbf{Methods}: We conducted experiments using various ASR models, refining their transcripts with post-editing techniques. Both these imperfect ASR transcripts and manually transcribed ones were used as inputs for the downstream dementia classification. We conducted comprehensive error analysis to compare model performance and assess ASR-generated transcript effectiveness in dementia classification. \textbf{Results}: Imperfect ASR-generated transcripts surprisingly outperformed manual transcription for distinguishing between individuals with AD and those without in the ``Cookie Theft'' task. These ASR-based models surpassed the previous state-of-the-art approach, indicating that ASR errors may contain valuable cues related to dementia. The synergy between ASR and classification models improved overall accuracy in dementia classification. \textbf{Conclusion}: Imperfect ASR transcripts effectively capture linguistic anomalies linked to dementia, improving accuracy in classification tasks. This synergy between ASR and classification models underscores ASR's potential as a valuable tool in assessing cognitive impairment and related clinical applications.
