On the Relevance of Clinical Assessment Tasks for the Automatic Detection of Parkinson's Disease Medication State from Speech
David Gimeno-Gómez, Rubén Solera-Ureña, Anna Pompili, Carlos-D. Martínez-Hinarejos, Rita Cardoso, Isabel Guimarães, Joaquim J. Ferreira, Alberto Abad
TL;DR
This study addresses automatic detection of PD medication states from speech using a speaker-independent framework. It evaluates a hybrid representation strategy that combines knowledge-based features (eGeMAPS) with self-supervised Wav2Vec2.0 embeddings, across SVM and A-DNN classifiers, and across task-focused data grouping. The key finding is that SSL-based representations on continuous, prosody-rich tasks (notably PROS-SENT) yield the best performance, achieving an F1-score of 88.2% in a speaker-independent setting, with SVM often matching or surpassing A-DNNs. The work highlights practical implications for remote PD monitoring, suggesting task selection and model simplicity (SVM) can improve real-world deployment, while signaling avenues for interpretability and richer clinical data integration in future work.
Abstract
The automatic identification of medication states of Parkinson's disease (PD) patients can assist clinicians in monitoring and scheduling personalized treatments, as well as studying the effects of medication in alleviating the motor symptoms that characterize the disease. This paper explores speech as a non-invasive and accessible biomarker for identifying PD medication states, introducing a novel approach that addresses this task from a speaker-independent perspective. While traditional machine learning models achieve competitive results, self-supervised speech representations prove essential for optimal performance, significantly surpassing knowledge-based acoustic descriptors. Experiments across diverse speech assessment tasks highlight the relevance of prosody and continuous speech in distinguishing medication states, reaching an F1-score of 88.2%. These findings may streamline clinicians' work and reduce patient effort in voice recordings.
