Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease
Junan Li, Yunxiang Li, Yuren Wang, Xixin Wu, Helen Meng
TL;DR
This work tackles scalable, non-invasive Alzheimer's screening from spoken language by introducing a compact, explainable feature set derived from Cookie Theft descriptions. It combines LLM-assisted topic keyword generation, content-coverage metrics (BLEU/METEOR), and TF-IDF–based similarity features to create interpretable predictors. Evaluated on the ADReSS Challenge 2020 dataset with two classifiers and Bayesian-optimized hyperparameters, the proposed features outperform a traditional linguistic feature set, achieving up to 85.4% accuracy with just 15 features. The approach demonstrates strong dimensional efficiency and interpretability, with feature-importance analyses confirming the value of topic-related and TF-IDF features, and shows potential for scalable, transparent AD screening in clinical and screening settings.
Abstract
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society. The use of spoken language-based AD detection methods has gained prevalence due to their scalability due to their scalability. Based on the Cookie Theft picture description task, we devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model. Our experimental results show that the newly proposed features consistently outperform traditional linguistic features across two different classifiers with high dimension efficiency. Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
