Table of Contents
Fetching ...

Connected Speech-Based Cognitive Assessment in Chinese and English

Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

TL;DR

The paper tackles cross-language cognitive assessment from connected speech by introducing a Chinese–English benchmark with propensity-score–matched samples to predict MCI and MMSE scores. It proposes a language-general modelling framework that fuses acoustic (eGeMAPs; wav2vec) and linguistic features derived from ASR/POS tagging in a single MLP architecture. The results show competitive MCI classification (UAR ≈ 59–60%) and MMSE score prediction (RMSE ≈ 2.89), with linguistic features performing strongly for regression and wav2vec contributing when combined. By releasing the dataset and baseline methods, the work advances cross-lingual speech biomarkers for cognitive function, enabling broader validation and development across languages.

Abstract

We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction.

Connected Speech-Based Cognitive Assessment in Chinese and English

TL;DR

The paper tackles cross-language cognitive assessment from connected speech by introducing a Chinese–English benchmark with propensity-score–matched samples to predict MCI and MMSE scores. It proposes a language-general modelling framework that fuses acoustic (eGeMAPs; wav2vec) and linguistic features derived from ASR/POS tagging in a single MLP architecture. The results show competitive MCI classification (UAR ≈ 59–60%) and MMSE score prediction (RMSE ≈ 2.89), with linguistic features performing strongly for regression and wav2vec contributing when combined. By releasing the dataset and baseline methods, the work advances cross-lingual speech biomarkers for cognitive function, enabling broader validation and development across languages.

Abstract

We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction.
Paper Structure (11 sections, 2 figures, 3 tables)

This paper contains 11 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: General architecture for multilingual cognitive assessment based on recorded speech.
  • Figure 2: Venn diagram showing the effect of each features set on classification with respect to Ground Truth (GT).