XPPG-PCA: Reference-free automatic speech severity evaluation with principal components
Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik, Rob J. J. H. van Son, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Defne Abur, Tomoki Toda
TL;DR
This work targets automatic, reference-free severity evaluation of pathological speech by fusing x-vector embeddings with phonetic posteriorgrams and reducing them via PCA. The method, s_noref = h(x_path) · C_1, is unsupervised and requires no perceptual labels, enabling generalization across disorders and datasets. Evaluations on Dutch oral cancer cohorts show XPPG-PCA achieving correlations up to around 0.90 with perceptual scores and robustness to noise and limited utterances, often outperforming reference-based baselines. Limitations include emphasis on read speech and dysarthria performance gaps, with open-source implementation and clear directions for language expansion and feature enhancements to broaden clinical utility.
Abstract
Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.
