Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

Maria R. Lima; Alexander Capstick; Fatemeh Geranmayeh; Ramin Nilforooshan; Maja Matarić; Ravi Vaidyanathan; Payam Barnaghi

Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

Maria R. Lima, Alexander Capstick, Fatemeh Geranmayeh, Ramin Nilforooshan, Maja Matarić, Ravi Vaidyanathan, Payam Barnaghi

TL;DR

The paper tackles the need for scalable, non-invasive screening of cognitive impairment by leveraging spoken language biomarkers. It develops an interpretable ML pipeline, prioritizing lexical-based linguistic features and SHAP explanations to predict ADRD risk and MMSE severity from DementiaBank data, with external validation and a real-world pilot. Results show strong ADRD discrimination on a held-out DementiaBank test (ROC-AUC ~0.86) and reasonable MMSE prediction accuracy (MAE ~3.7), with careful risk stratification (Green/Red) to aid clinical triage. The work demonstrates potential for in-home cognitive health monitoring via conversational AI, while acknowledging limitations in generalizability, ASR noise, and pilot-scale validation, and outlining clear paths for extension to longitudinal, multilingual, and multi-modal settings.

Abstract

Timely and accurate assessment of cognitive impairment is a major unmet need in populations at risk. Alterations in speech and language can be early predictors of Alzheimer's disease and related dementias (ADRD) before clinical signs of neurodegeneration. Voice biomarkers offer a scalable and non-invasive solution for automated screening. However, the clinical applicability of machine learning (ML) remains limited by challenges in generalisability, interpretability, and access to patient data to train clinically applicable predictive models. Using DementiaBank recordings (N=291, 64% female), we evaluated ML techniques for ADRD screening and severity prediction from spoken language. We validated model generalisability with pilot data collected in-residence from older adults (N=22, 59% female). Risk stratification and linguistic feature importance analysis enhanced the interpretability and clinical utility of predictions. For ADRD classification, a Random Forest applied to lexical features achieved a mean sensitivity of 69.4% (95% confidence interval (CI) = 66.4-72.5) and specificity of 83.3% (78.0-88.7). On real-world pilot data, this model achieved a mean sensitivity of 70.0% (58.0-82.0) and specificity of 52.5% (39.3-65.7). For severity prediction using Mini-Mental State Examination (MMSE) scores, a Random Forest Regressor achieved a mean absolute MMSE error of 3.7 (3.7-3.8), with comparable performance of 3.3 (3.1-3.5) on pilot data. Linguistic features associated with higher ADRD risk included increased use of pronouns and adverbs, greater disfluency, reduced analytical thinking, lower lexical diversity and fewer words reflecting a psychological state of completion. Our interpretable predictive modelling offers a novel approach for in-home integration with conversational AI to monitor cognitive health and triage higher-risk individuals, enabling earlier detection and intervention.

Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

TL;DR

Abstract

Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)