Speech as a Biomarker for Disease Detection
Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso
TL;DR
Speech serves as a rich, noninvasive biomarker for diseases affecting respiration, nervous, and muscular systems, but practical deployment is hindered by non-specific speech changes and lack of interpretability. The authors propose an interpretable framework that defines reference speech via reference intervals (RIs) derived from a reference population, with RI limits set as the $2.5^{th}$ and $97.5^{th}$ percentiles, and uses Neural Additive Models (NAMs) to detect disease signatures as deviations from these RIs. They apply the framework to Alzheimer's disease and Parkinson's disease using publicly available corpora (ADReSS and PC-GITA) and provide transparent, feature-level explanations of decisions. The work lays groundwork for dataset-agnostic, multi-disease speech diagnostics that can function as a second opinion for clinicians and support broader health monitoring.
Abstract
Speech is a rich biomarker that encodes substantial information about the health of a speaker, and thus it has been proposed for the detection of numerous diseases, achieving promising results. However, questions remain about what the models trained for the automatic detection of these diseases are actually learning and the basis for their predictions, which can significantly impact patients' lives. This work advocates for an interpretable health model, suitable for detecting several diseases, motivated by the observation that speech-affecting disorders often have overlapping effects on speech signals. A framework is presented that first defines "reference speech" and then leverages this definition for disease detection. Reference speech is characterized through reference intervals, i.e., the typical values of clinically meaningful acoustic and linguistic features derived from a reference population. This novel approach in the field of speech as a biomarker is inspired by the use of reference intervals in clinical laboratory science. Deviations of new speakers from this reference model are quantified and used as input to detect Alzheimer's and Parkinson's disease. The classification strategy explored is based on Neural Additive Models, a type of glass-box neural network, which enables interpretability. The proposed framework for reference speech characterization and disease detection is designed to support the medical community by providing clinically meaningful explanations that can serve as a valuable second opinion.
