Speech as a Biomarker for Disease Detection

Catarina Botelho; Alberto Abad; Tanja Schultz; Isabel Trancoso

Speech as a Biomarker for Disease Detection

Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso

TL;DR

Speech serves as a rich, noninvasive biomarker for diseases affecting respiration, nervous, and muscular systems, but practical deployment is hindered by non-specific speech changes and lack of interpretability. The authors propose an interpretable framework that defines reference speech via reference intervals (RIs) derived from a reference population, with RI limits set as the $2.5^{th}$ and $97.5^{th}$ percentiles, and uses Neural Additive Models (NAMs) to detect disease signatures as deviations from these RIs. They apply the framework to Alzheimer's disease and Parkinson's disease using publicly available corpora (ADReSS and PC-GITA) and provide transparent, feature-level explanations of decisions. The work lays groundwork for dataset-agnostic, multi-disease speech diagnostics that can function as a second opinion for clinicians and support broader health monitoring.

Abstract

Speech is a rich biomarker that encodes substantial information about the health of a speaker, and thus it has been proposed for the detection of numerous diseases, achieving promising results. However, questions remain about what the models trained for the automatic detection of these diseases are actually learning and the basis for their predictions, which can significantly impact patients' lives. This work advocates for an interpretable health model, suitable for detecting several diseases, motivated by the observation that speech-affecting disorders often have overlapping effects on speech signals. A framework is presented that first defines "reference speech" and then leverages this definition for disease detection. Reference speech is characterized through reference intervals, i.e., the typical values of clinically meaningful acoustic and linguistic features derived from a reference population. This novel approach in the field of speech as a biomarker is inspired by the use of reference intervals in clinical laboratory science. Deviations of new speakers from this reference model are quantified and used as input to detect Alzheimer's and Parkinson's disease. The classification strategy explored is based on Neural Additive Models, a type of glass-box neural network, which enables interpretability. The proposed framework for reference speech characterization and disease detection is designed to support the medical community by providing clinically meaningful explanations that can serve as a valuable second opinion.

Speech as a Biomarker for Disease Detection

TL;DR

and

percentiles, and uses Neural Additive Models (NAMs) to detect disease signatures as deviations from these RIs. They apply the framework to Alzheimer's disease and Parkinson's disease using publicly available corpora (ADReSS and PC-GITA) and provide transparent, feature-level explanations of decisions. The work lays groundwork for dataset-agnostic, multi-disease speech diagnostics that can function as a second opinion for clinicians and support broader health monitoring.

Abstract

Paper Structure (27 sections, 1 equation, 10 figures, 10 tables)

This paper contains 27 sections, 1 equation, 10 figures, 10 tables.

Introduction
Challenges in the automatic detection of speech affecting diseases
Overlapping manifestations in speech and multimorbidity
Data scarcity
Related work
Characterizing reference speech
Reference intervals in clinical laboratory science
Corpora
Reference Population Data
Datasets for disease detection
Reference speech characterization (RSC)
Reference Speech Characterization Pipeline
Reference Speech Results
Classification of speech affecting diseases
Disease detection pipeline
...and 12 more sections

Figures (10)

Figure 1: Examples of mechanisms through which speech affecting diseases impact the speech signal. The diagram includes references voleti_berisha2019reviewbedi2015psychosisiter2018schizophreniasanz2022coherenceADpompili2019phdhier1985language_dementiaforbes2002distinctoppenheim1994earliest_adhoffmann2010temporal_adkrishnan2002comorbiditylaird2019late-life-depression-psychobiologicalritasingh2019profilingtolboll2019depression_1stppronounsrude2004language_depressionflint1993abnormal_depressioncummins2015reviewma2020voicePDvasquez2017convolutionalramig2008speech_pdhecker2022voiceasiaee2020voice_covidal2021covid_vocal_foldsOSA6-pozo2009malhotra2002osa_lancetmonoson_and_fox1987preliminaryhalahakoon2019cognitivekerns2002cognitivehoodin1989nasalmartinez2016speechnoffs2021speechnoffs2018speechjiao_berisha2017interpretablesaxon2019objective. (*) articulation rate and pauses have been reported to be associated with depression, via psychomotor retardation, however, although psychomotor retardation is also present in PD, these features have inconsistent reports for PD skodda2011aspects. (**) Disturbed nasalization, as a consequence of impaired velum control, is associated with PD via psychomotor retardation and with OSA. In PD an increase in nasal airflow is reported hoodin1989nasal, while for OSA, a smaller difference between nasal and oral sounds is reported monoson_and_fox1987preliminary.
Figure 2: Overview of the steps entailed for reference speech characterization.
Figure 3: Correlation analysis of the features extracted from vowel recordings (top) and picture description transcriptions (bottom). (a) shows the Pearson correlation between the features. The values above the diagonal refer to features extracted for female subjects, while the values below the diagonal refer to male subjects. (b) shows the dendrogram results from the hierarchical clustering of features, based on their Pearson correlation correlation. The y axis corresponds to $1-CorrelationTreshold$, to capture the distance between features of the same cluster.
Figure 4: Radar plots to characterize reference speech, using the task sustained vowel /a/. The dark green line corresponds to the mean value of each feature, while the light green lines correspond to the RI, computed using the reference population. Blue lines correspond to control speakers, whereas pink lines correspond to patients (PD).
Figure 5: Radar plots to characterize reference speech, using the task picture description. The dark green line corresponds to the mean value of each feature, while the light green lines correspond to the reference interval, computed using the reference population. Blue lines correspond to control speakers, whereas pink lines correspond to patients (AD).
...and 5 more figures

Speech as a Biomarker for Disease Detection

TL;DR

Abstract

Speech as a Biomarker for Disease Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)