Table of Contents
Fetching ...

Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease

Junan Li, Yunxiang Li, Yuren Wang, Xixin Wu, Helen Meng

TL;DR

This work tackles scalable, non-invasive Alzheimer's screening from spoken language by introducing a compact, explainable feature set derived from Cookie Theft descriptions. It combines LLM-assisted topic keyword generation, content-coverage metrics (BLEU/METEOR), and TF-IDF–based similarity features to create interpretable predictors. Evaluated on the ADReSS Challenge 2020 dataset with two classifiers and Bayesian-optimized hyperparameters, the proposed features outperform a traditional linguistic feature set, achieving up to 85.4% accuracy with just 15 features. The approach demonstrates strong dimensional efficiency and interpretability, with feature-importance analyses confirming the value of topic-related and TF-IDF features, and shows potential for scalable, transparent AD screening in clinical and screening settings.

Abstract

Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society. The use of spoken language-based AD detection methods has gained prevalence due to their scalability due to their scalability. Based on the Cookie Theft picture description task, we devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model. Our experimental results show that the newly proposed features consistently outperform traditional linguistic features across two different classifiers with high dimension efficiency. Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.

Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease

TL;DR

This work tackles scalable, non-invasive Alzheimer's screening from spoken language by introducing a compact, explainable feature set derived from Cookie Theft descriptions. It combines LLM-assisted topic keyword generation, content-coverage metrics (BLEU/METEOR), and TF-IDF–based similarity features to create interpretable predictors. Evaluated on the ADReSS Challenge 2020 dataset with two classifiers and Bayesian-optimized hyperparameters, the proposed features outperform a traditional linguistic feature set, achieving up to 85.4% accuracy with just 15 features. The approach demonstrates strong dimensional efficiency and interpretability, with feature-importance analyses confirming the value of topic-related and TF-IDF features, and shows potential for scalable, transparent AD screening in clinical and screening settings.

Abstract

Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society. The use of spoken language-based AD detection methods has gained prevalence due to their scalability due to their scalability. Based on the Cookie Theft picture description task, we devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model. Our experimental results show that the newly proposed features consistently outperform traditional linguistic features across two different classifiers with high dimension efficiency. Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.

Paper Structure

This paper contains 14 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Schematic diagram of topic keyword generation. (a) shows the Cookie Theft picture and how we segment the picture. (b), (c) and (d) show the sub-pictures we crop. Three subpictures are sent to LLM with instructions for generating keywords.
  • Figure 2: Top 15 Important Features in Random Forest and ANOVA F-values. The charts display the top 15 features ranked by their importance in the Random Forest model (a) and by their ANOVA F-values (b). Newly proposed features are highlighted in blue names, with their corresponding bars in distinct colors.
  • Figure 3: Ablation study accuracy result. The x-axis indicates the number of feature we added into the experiment.