Tackling Cognitive Impairment Detection from Speech: A submission to the PROCESS Challenge
Catarina Botelho, David Gimeno-Gómez, Francisco Teixeira, John Mendonça, Patrícia Pereira, Diogo A. P. Nunes, Thomas Rolland, Anna Pompili, Rubén Solera-Ureña, Maria Ponte, David Martins de Matos, Carlos-D. Martínez-Hinarejos, Isabel Trancoso, Alberto Abad
TL;DR
This work presents a PROCESS-2024 submission aiming to detect cognitive decline from spontaneous speech across three elicitation tasks. It combines knowledge-based acoustic/textual features, macrodescriptors derived from LLM prompts, pause-based biomarkers, and multiple neural representations, exploiting diverse classifiers and a late-fusion ensemble to leverage complementary information. The two best-performing ensembles (each with six single systems) demonstrate improved dementia-class performance by integrating Longformer CTD representations, ECAPA-TDNN/TRILLsson embeddings, and pause/macro-descriptor features. The study highlights the value of multimodal, task-diverse representations for early dementia detection while acknowledging dataset-imposed limitations that motivate validation on larger, demographically richer corpora.
Abstract
This work describes our group's submission to the PROCESS Challenge 2024, with the goal of assessing cognitive decline through spontaneous speech, using three guided clinical tasks. This joint effort followed a holistic approach, encompassing both knowledge-based acoustic and text-based feature sets, as well as LLM-based macrolinguistic descriptors, pause-based acoustic biomarkers, and multiple neural representations (e.g., LongFormer, ECAPA-TDNN, and Trillson embeddings). Combining these feature sets with different classifiers resulted in a large pool of models, from which we selected those that provided the best balance between train, development, and individual class performance. Our results show that our best performing systems correspond to combinations of models that are complementary to each other, relying on acoustic and textual information from all three clinical tasks.
