Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
Peter Plantinga, Roozbeh Sattari, Karine Marcotte, Carla Di Gironimo, Madeleine Sharp, Liziane Bouvier, Maiya Geddes, Ingrid Verduyckt, Étienne de Villers-Sidani, Mirco Ravanelli, Denise Klein
TL;DR
This paper investigates diagnosing Parkinson's disease from speech by directly comparing human expert judgments to a Whisper-based classifier across five speech tasks. Using the Quebec Parkinson Network dataset, the authors demonstrate that audio-only predictions from Whisper can match or exceed human performance in several demographics, particularly for spontaneous speech in younger, mild, and female patients. The results highlight complementary strengths: clinicians leverage multimodal context while Whisper exploits acoustic cues that may be overlooked by humans. The work emphasizes the potential of ML-based speech analysis to improve access to diagnostic screening, alongside the need for explainability and accountability in clinical deployment.
Abstract
The speech of people with Parkinson's Disease (PD) has been shown to hold important clues about the presence and progression of the disease. We investigate the factors based on which humans experts make judgments of the presence of disease in speech samples over five different speech tasks: phonations, sentence repetition, reading, recall, and picture description. We make comparisons by conducting listening tests to determine clinicians accuracy at recognizing signs of PD from audio alone, and we conduct experiments with a machine learning system for detection based on Whisper. Across tasks, Whisper performs on par or better than human experts when only audio is available, especially on challenging but important subgroups of the data: younger patients, mild cases, and female patients. Whisper's ability to recognize acoustic cues in difficult cases complements the multimodal and contextual strengths of human experts.
