Table of Contents
Fetching ...

Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech

Ilias Tougui, Mehdi Zakroum, Mounir Ghogho

TL;DR

This work presents a cross-lingual, granularity-aware framework for Parkinson's disease detection from speech by extracting time-aligned phonemes, syllables, and words across Italian, Spanish, and English. A BiLSTM with multi-head attention processes XLSR-53-based features from multi-granularity inputs, enabling direct comparison of phoneme-, syllable-, and word-level biomarkers. Phoneme-level analysis yields the strongest performance (AUROC ≈ 0.938; accuracy ≈ 0.922), with interpretability analyses aligning attention with clinically established biomarkers such as sustained vowels, DDK syllables, and the /pataka/ sequence. The results underscore the potential of fine-grained, cross-lingual speech biomarkers for PD screening and motivate broader linguistic coverage and clinical validation.

Abstract

Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.

Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech

TL;DR

This work presents a cross-lingual, granularity-aware framework for Parkinson's disease detection from speech by extracting time-aligned phonemes, syllables, and words across Italian, Spanish, and English. A BiLSTM with multi-head attention processes XLSR-53-based features from multi-granularity inputs, enabling direct comparison of phoneme-, syllable-, and word-level biomarkers. Phoneme-level analysis yields the strongest performance (AUROC ≈ 0.938; accuracy ≈ 0.922), with interpretability analyses aligning attention with clinically established biomarkers such as sustained vowels, DDK syllables, and the /pataka/ sequence. The results underscore the potential of fine-grained, cross-lingual speech biomarkers for PD screening and motivate broader linguistic coverage and clinical validation.

Abstract

Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.

Paper Structure

This paper contains 6 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Speech units extraction framework: Recordings are processed through voice activity detection, transcription, phonemization and syllabification to infer multi-granular speech units like words, syllables and phonemes with temporal boundaries.
  • Figure 2: Architecture of the Parkinson's Disease Prediction Model: a bidirectional LSTM with multi-head attention
  • Figure 3: Multi-granularity attention weights of the model on the test set for PD recognition. Heat maps show importance rankings of top 20 phonemes, syllables, and words, with color intensity indicating attention scores.