Table of Contents
Fetching ...

XPPG-PCA: Reference-free automatic speech severity evaluation with principal components

Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik, Rob J. J. H. van Son, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Defne Abur, Tomoki Toda

TL;DR

This work targets automatic, reference-free severity evaluation of pathological speech by fusing x-vector embeddings with phonetic posteriorgrams and reducing them via PCA. The method, s_noref = h(x_path) · C_1, is unsupervised and requires no perceptual labels, enabling generalization across disorders and datasets. Evaluations on Dutch oral cancer cohorts show XPPG-PCA achieving correlations up to around 0.90 with perceptual scores and robustness to noise and limited utterances, often outperforming reference-based baselines. Limitations include emphasis on read speech and dysarthria performance gaps, with open-source implementation and clear directions for language expansion and feature enhancements to broaden clinical utility.

Abstract

Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.

XPPG-PCA: Reference-free automatic speech severity evaluation with principal components

TL;DR

This work targets automatic, reference-free severity evaluation of pathological speech by fusing x-vector embeddings with phonetic posteriorgrams and reducing them via PCA. The method, s_noref = h(x_path) · C_1, is unsupervised and requires no perceptual labels, enabling generalization across disorders and datasets. Evaluations on Dutch oral cancer cohorts show XPPG-PCA achieving correlations up to around 0.90 with perceptual scores and robustness to noise and limited utterances, often outperforming reference-based baselines. Limitations include emphasis on read speech and dysarthria performance gaps, with open-source implementation and clear directions for language expansion and feature enhancements to broaden clinical utility.

Abstract

Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.

Paper Structure

This paper contains 33 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of our proposed XPPG-PCA approach
  • Figure 2: Annotated severity scores plotted against the automatic results produced by the XPPG-PCA
  • Figure 3: Effect of noise on severity evaluation performance as measured by Pearson's correlation and Root Mean Squared Error (RMSE).
  • Figure 4: Correlation results for severity on the following datasets: (a) NKI-SpeechRT, (b) NKI-OC-VC, and (c, d) NKI-RUG-UMCG. The purple shaded area in (d) represents the $95\%$ confidence intervals, while the blue line marks the point where the correlation exceeds $r=0.8$, even when considering the lower bound of the confidence intervals. The lower part of the plot zooms into the region from 1-50 for better visibility.