Quantifying the effect of speech pathology on automatic and human speaker verification
Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan de Visscher, Max Witjes, Martijn Wieling, Defne Abur, Tomoki Toda
TL;DR
This work investigates how speech pathology resulting from oral cancer surgery affects automatic speaker verification (ASV) and its relation to speech severity, using parallel pre-/post-surgery Dutch data from NKI-OC-VC and SPOKE. It introduces objective measures of speaker similarity ($sim_{obj}$) and severity ($sev_{obj}$) and pairs them with subjective judgments ($sim_{sub}$, $sev_{sub}$) to compare with human perception. The study finds that pathology degrades ASV performance, with larger degradation tied to higher severity, and observes only moderate agreement between objective and subjective assessments; human perceptual judgments do not clearly mirror the ASV findings. Practically, the results underscore robustness gaps in ASV for pathological speech and suggest data augmentation and listener-aware modeling as potential routes, while cautioning against using ASV as a substitute for perceptual evaluation in voice-conversion contexts.
Abstract
This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception.
