Table of Contents
Fetching ...

Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Maksim Siniukov, Ellie Xing, Sanaz Attaripour Isfahani, Mohammad Soleymani

TL;DR

The study addresses the challenge of generalizing speech-based Parkinson's disease detection across languages and datasets by coupling domain adaptive pretraining (DAPT) of HuBERT on elderly speech with domain adversarial training (DAT) to learn language-invariant features. The approach leverages unlabeled elder-focused data (e.g., AgeVOX-Celeb, mPower) for adaptation and applies DAT to PD datasets in English, Italian, and Spanish, achieving state-of-the-art per-person metrics across four public PD corpora. Key results show high per-person F1, accuracy, sensitivity, PPV, and specificity (e.g., F1pp ≈ 89.2, Accpp ≈ 92.0, SEpp ≈ 91.2, PPVpp ≈ 90.5, SPpp ≈ 92.1), demonstrating robust cross-language PD detection. The method offers a non-invasive, scalable screening option with potential clinical impact for early diagnosis and monitoring, though it requires further validation and interpretability improvements to be deployed in routine care.

Abstract

Parkinson's Disease (PD) is a neurodegenerative disorder characterized by motor symptoms, including altered voice production in the early stages. Early diagnosis is crucial not only to improve PD patients' quality of life but also to enhance the efficacy of potential disease-modifying therapies during early neurodegeneration, a window often missed by current diagnostic tools. In this paper, we propose a more generalizable approach to PD recognition through domain adaptation and self-supervised learning. We demonstrate the generalization capabilities of the proposed approach across diverse datasets in different languages. Our approach leverages HuBERT, a large deep neural network originally trained for speech recognition and further trains it on unlabeled speech data from a population that is similar to the target group, i.e., the elderly, in a self-supervised manner. The model is then fine-tuned and adapted for use across different datasets in multiple languages, including English, Italian, and Spanish. Evaluations on four publicly available PD datasets demonstrate the model's efficacy, achieving an average specificity of 92.1% and an average sensitivity of 91.2%. This method offers objective and consistent evaluations across large populations, addressing the variability inherent in human assessments and providing a non-invasive, cost-effective and accessible diagnostic option.

Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

TL;DR

The study addresses the challenge of generalizing speech-based Parkinson's disease detection across languages and datasets by coupling domain adaptive pretraining (DAPT) of HuBERT on elderly speech with domain adversarial training (DAT) to learn language-invariant features. The approach leverages unlabeled elder-focused data (e.g., AgeVOX-Celeb, mPower) for adaptation and applies DAT to PD datasets in English, Italian, and Spanish, achieving state-of-the-art per-person metrics across four public PD corpora. Key results show high per-person F1, accuracy, sensitivity, PPV, and specificity (e.g., F1pp ≈ 89.2, Accpp ≈ 92.0, SEpp ≈ 91.2, PPVpp ≈ 90.5, SPpp ≈ 92.1), demonstrating robust cross-language PD detection. The method offers a non-invasive, scalable screening option with potential clinical impact for early diagnosis and monitoring, though it requires further validation and interpretability improvements to be deployed in routine care.

Abstract

Parkinson's Disease (PD) is a neurodegenerative disorder characterized by motor symptoms, including altered voice production in the early stages. Early diagnosis is crucial not only to improve PD patients' quality of life but also to enhance the efficacy of potential disease-modifying therapies during early neurodegeneration, a window often missed by current diagnostic tools. In this paper, we propose a more generalizable approach to PD recognition through domain adaptation and self-supervised learning. We demonstrate the generalization capabilities of the proposed approach across diverse datasets in different languages. Our approach leverages HuBERT, a large deep neural network originally trained for speech recognition and further trains it on unlabeled speech data from a population that is similar to the target group, i.e., the elderly, in a self-supervised manner. The model is then fine-tuned and adapted for use across different datasets in multiple languages, including English, Italian, and Spanish. Evaluations on four publicly available PD datasets demonstrate the model's efficacy, achieving an average specificity of 92.1% and an average sensitivity of 91.2%. This method offers objective and consistent evaluations across large populations, addressing the variability inherent in human assessments and providing a non-invasive, cost-effective and accessible diagnostic option.
Paper Structure (17 sections, 6 equations, 1 figure, 4 tables)

This paper contains 17 sections, 6 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of the pipeline. In the first stage, HuBERT Domain Adaptive Pretraining (DAPT) is performed on the HuBERT model to adapt to the target domain data containing speech from the elderly$^{*}$. Target domain data may be either mPower dataset or AgeVox dataset. The learning objective for DAPT is masked-units cross-entropy loss. DAPT-tuned weights are transferred to the second stage. In the second stage, Domain Adversarial Training (DAT) is applied to PD datasets$^{**}$. DAT objectives are PD prediction cross-entropy loss and domain discrimination cross-entropy loss. A Gradient Reversal Layer (GRL) is placed between the domain classifier and HuBERT networks. $^{*}$ Target domain data may be: mPower dataset or AgeVox dataset. $^{**}$ PD datasets are: PD-GITA, PD-Neurovoz, PD-Italian, and MDVR-KCL.