Quantifying the effect of speech pathology on automatic and human speaker verification

Bence Mark Halpern; Thomas Tienkamp; Wen-Chin Huang; Lester Phillip Violeta; Teja Rebernik; Sebastiaan de Visscher; Max Witjes; Martijn Wieling; Defne Abur; Tomoki Toda

Quantifying the effect of speech pathology on automatic and human speaker verification

Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan de Visscher, Max Witjes, Martijn Wieling, Defne Abur, Tomoki Toda

TL;DR

This work investigates how speech pathology resulting from oral cancer surgery affects automatic speaker verification (ASV) and its relation to speech severity, using parallel pre-/post-surgery Dutch data from NKI-OC-VC and SPOKE. It introduces objective measures of speaker similarity ($sim_{obj}$) and severity ($sev_{obj}$) and pairs them with subjective judgments ($sim_{sub}$, $sev_{sub}$) to compare with human perception. The study finds that pathology degrades ASV performance, with larger degradation tied to higher severity, and observes only moderate agreement between objective and subjective assessments; human perceptual judgments do not clearly mirror the ASV findings. Practically, the results underscore robustness gaps in ASV for pathological speech and suggest data augmentation and listener-aware modeling as potential routes, while cautioning against using ASV as a substitute for perceptual evaluation in voice-conversion contexts.

Abstract

This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception.

Quantifying the effect of speech pathology on automatic and human speaker verification

TL;DR

) and severity (

) and pairs them with subjective judgments (

) to compare with human perception. The study finds that pathology degrades ASV performance, with larger degradation tied to higher severity, and observes only moderate agreement between objective and subjective assessments; human perceptual judgments do not clearly mirror the ASV findings. Practically, the results underscore robustness gaps in ASV for pathological speech and suggest data augmentation and listener-aware modeling as potential routes, while cautioning against using ASV as a substitute for perceptual evaluation in voice-conversion contexts.

Abstract

Paper Structure (19 sections, 2 figures, 2 tables)

This paper contains 19 sections, 2 figures, 2 tables.

Introduction
Datasets
NKI-OC-VC
SPOKE
Methods
Objective speaker similarity (ASV) $sim_{obj}$
Objective severity (P-ESTOI) $sev_{obj}$
Subjective similarity and severity
Experiments
RQ1: Is automatic speaker verification impacted by the presence of pathology?
RQ2: Speaker-severity vs speaker EER
RQ3: General relationship between variables
Results
RQ1: Is ASV impacted by the presence of pathology?
RQ2: Speaker-severity vs speaker-EER
...and 4 more sections

Figures (2)

Figure 1: Comparison of cosine similarity distributions and EER thresholds for same-speaker and different-speaker trials across different time points. Asterisks indicate the significance level of the correlations: (**) $p < .01$, and (***) $p < .001$.
Figure 2: Objective and subjective speaker severity score in relation to EER performance.

Quantifying the effect of speech pathology on automatic and human speaker verification

TL;DR

Abstract

Quantifying the effect of speech pathology on automatic and human speaker verification

Authors

TL;DR

Abstract

Table of Contents

Figures (2)