Infusing Acoustic Pause Context into Text-Based Dementia Assessment

Franziska Braun; Sebastian P. Bayerl; Florian Hönig; Hartmut Lehfeld; Thomas Hillemacher; Tobias Bocklet; Korbinian Riedhammer

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

Franziska Braun, Sebastian P. Bayerl, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

TL;DR

The paper tackles non-invasive dementia screening by infusing acoustic pause context into text-based transformer models. It presents a pause-enriched multimodal pipeline combining German BERT text embeddings with wav2vec 2.0 audio representations, using pause-token baselines (P1–P4) and disfluency tokens, and investigates self-attention and cross-attention configurations across Verbal Fluency Test and Picture Description Test data. Findings show task-dependent gains: NC vs MCI benefits from acoustic context in VFT, MCI vs AD is best via PDT with pause modeling and disfluencies, and NC vs AD remains robust with pause modeling, though W2V2 is generally weaker. The work highlights pause-informed modeling as a promising, scalable cue for dementia detection while noting confounds and dataset size as key limitations for real-world clinical adoption.

Abstract

Speech pauses, alongside content and structure, offer a valuable and non-invasive biomarker for detecting dementia. This work investigates the use of pause-enriched transcripts in transformer-based language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment. We address three binary classification tasks: Onset, monitoring, and dementia exclusion. The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts. Starting from a textual baseline, we investigate the effect of incorporation of pause information and acoustic context. We show the test should be chosen depending on the task, and similarly, lexical pause information and acoustic cross-attention contribute differently.

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

TL;DR

Abstract

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (1)