Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech
Cong Zhang, Wenxing Guo, Hongsheng Dai
TL;DR
This work tackles automatic detection of Mild Cognitive Impairment (MCI) from spontaneous speech within the TAUKADIAL framework by leveraging high-dimensional acoustic features extracted with openSMILE (emobase with 988 features and eGeMAPSv02 with 88 features). It compares five classifiers—Random Forest (RF), Sparse Logistic Regression (SLR), k-Nearest Neighbors (KNN), Sparse Support Vector Machine (SSVM), and Decision Tree (DT)—across three experiments, including language-agnostic, language-aware, and out-of-sample robustness evaluations. Across experiments, RF and SLR consistently outperform others in handling high-dimensional data, with KNN performing least well and language-detection generally maintaining strong performance. The findings support the feasibility of a fully automatic, speech-based screening tool for MCI and outline future directions such as cross-validation variants, integration of demographic/cognitive scores, and exploration of additional acoustic features for improved generalization.
Abstract
This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically extracted using openSMILE. In Experiment 1, the entire dataset was used to train a language-agnostic model. Experiment 2 introduced a language detection step, leading to separate model training for each language. Experiment 3 further enhanced the language-agnostic model from Experiment 1, with a specific focus on evaluating the robustness of the models using out-of-sample test data. Across all three experiments, results consistently favored models capable of handling high-dimensional data, such as Random Forest and Sparse Logistic Regression, in classifying speech from MCI and controls.
