Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

Cong Zhang; Wenxing Guo; Hongsheng Dai

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

Cong Zhang, Wenxing Guo, Hongsheng Dai

TL;DR

This work tackles automatic detection of Mild Cognitive Impairment (MCI) from spontaneous speech within the TAUKADIAL framework by leveraging high-dimensional acoustic features extracted with openSMILE (emobase with 988 features and eGeMAPSv02 with 88 features). It compares five classifiers—Random Forest (RF), Sparse Logistic Regression (SLR), k-Nearest Neighbors (KNN), Sparse Support Vector Machine (SSVM), and Decision Tree (DT)—across three experiments, including language-agnostic, language-aware, and out-of-sample robustness evaluations. Across experiments, RF and SLR consistently outperform others in handling high-dimensional data, with KNN performing least well and language-detection generally maintaining strong performance. The findings support the feasibility of a fully automatic, speech-based screening tool for MCI and outline future directions such as cross-validation variants, integration of demographic/cognitive scores, and exploration of additional acoustic features for improved generalization.

Abstract

This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically extracted using openSMILE. In Experiment 1, the entire dataset was used to train a language-agnostic model. Experiment 2 introduced a language detection step, leading to separate model training for each language. Experiment 3 further enhanced the language-agnostic model from Experiment 1, with a specific focus on evaluating the robustness of the models using out-of-sample test data. Across all three experiments, results consistently favored models capable of handling high-dimensional data, such as Random Forest and Sparse Logistic Regression, in classifying speech from MCI and controls.

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 3 figures, 3 tables)

This paper contains 19 sections, 3 equations, 3 figures, 3 tables.

Introduction
Mild cognitive Impairment
Automatic classification
The current study
Methods
Data
Language identfication
Feature extraction
Feature selection
Classification methods
Evaluation metrics
Experiments
Experiment 1
Experiment 2
Experiment 3
...and 4 more sections

Figures (3)

Figure 1: F1 based on data split at random for 100 times. Train-test ratio denotes the proportions of the training and test sets.
Figure 2: Specificity based on data split at random for 100 times. Train-test ratio denotes the proportions of the training and test sets.
Figure 3: Unweighted Average Recal based on data split at random for 100 times. Train-test ratio denotes the proportions of the training and test sets.

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

TL;DR

Abstract

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

Authors

TL;DR

Abstract

Table of Contents

Figures (3)