Table of Contents
Fetching ...

Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0

Yueguan Wang, Tatsunari Matsushima, Soichiro Matsushima, Toshimitsu Sakai

TL;DR

This work demonstrates cross-lingual mild cognitive impairment detection from speech using W2V-BERT 2.0 SSL features, avoiding transcription-dependent bottlenecks. It introduces an efficient layer-selection strategy via learned layer weights and a purpose-built OR inference logic that leverages segment-level cues, achieving competitive results on the TAUKADIAL dataset. The study also analyzes speaker bias and data-split sensitivity, highlighting robust feature extraction and fair evaluation as key challenges for MCI classification with SSL representations. Overall, the approach advances language-independent MCI screening from audio and outlines practical directions to improve reliability and fairness in future research.

Abstract

This study explores a multi-lingual audio self-supervised learning model for detecting mild cognitive impairment (MCI) using the TAUKADIAL cross-lingual dataset. While speech transcription-based detection with BERT models is effective, limitations exist due to a lack of transcriptions and temporal information. To address these issues, the study utilizes features directly from speech utterances with W2V-BERT-2.0. We propose a visualization method to detect essential layers of the model for MCI classification and design a specific inference logic considering the characteristics of MCI. The experiment shows competitive results, and the proposed inference logic significantly contributes to the improvements from the baseline. We also conduct detailed analysis which reveals the challenges related to speaker bias in the features and the sensitivity of MCI classification accuracy to the data split, providing valuable insights for future research.

Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0

TL;DR

This work demonstrates cross-lingual mild cognitive impairment detection from speech using W2V-BERT 2.0 SSL features, avoiding transcription-dependent bottlenecks. It introduces an efficient layer-selection strategy via learned layer weights and a purpose-built OR inference logic that leverages segment-level cues, achieving competitive results on the TAUKADIAL dataset. The study also analyzes speaker bias and data-split sensitivity, highlighting robust feature extraction and fair evaluation as key challenges for MCI classification with SSL representations. Overall, the approach advances language-independent MCI screening from audio and outlines practical directions to improve reliability and fairness in future research.

Abstract

This study explores a multi-lingual audio self-supervised learning model for detecting mild cognitive impairment (MCI) using the TAUKADIAL cross-lingual dataset. While speech transcription-based detection with BERT models is effective, limitations exist due to a lack of transcriptions and temporal information. To address these issues, the study utilizes features directly from speech utterances with W2V-BERT-2.0. We propose a visualization method to detect essential layers of the model for MCI classification and design a specific inference logic considering the characteristics of MCI. The experiment shows competitive results, and the proposed inference logic significantly contributes to the improvements from the baseline. We also conduct detailed analysis which reveals the challenges related to speaker bias in the features and the sensitivity of MCI classification accuracy to the data split, providing valuable insights for future research.

Paper Structure

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The visualization of how weights of layers change during the training. The weights around the 18th layer become higher gradually as training proceeds.
  • Figure 2: Accuracy results of annotated features for each fold. LIN, ACO, FL and ALL denote linguistic, acoustic, fluency and all features, respectively.
  • Figure 3: Accuracy of validation set and test set under different inference logics with features of single layer. The peaks are marked as $(layer_{id}, acc$).
  • Figure 4: The layer weights of the final epoch in cross validation folds.