Table of Contents
Fetching ...

Adversarial Reweighting for Speaker Verification Fairness

Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke

TL;DR

The paper tackles fairness in speaker verification by introducing adversarial reweighting (ARW) in a metric-learning setting. A min–max objective drives an adversary to upweight underperforming samples, guiding the learner to improve hard cases without relying on explicit subgroup annotations. Three ARW formulations—accumulated pairwise similarity (APS), pseudo-labeling with K-means (PL), and pairwise weighting (PW)—are developed and evaluated on VoxCeleb, with PW yielding the best overall EER of $1.08\%$ and reducing the gender gap from $0.70\%$ to $0.58\%$, while also lowering group std dev for nationality. The results demonstrate substantial fairness gains alongside improved overall performance, suggesting ARW as a practical, annotation-free approach and a basis for extending to other speech tasks such as emotion recognition or medical diagnostics.

Abstract

We address performance fairness for speaker verification using the adversarial reweighting (ARW) method. ARW is reformulated for speaker verification with metric learning, and shown to improve results across different subgroups of gender and nationality, without requiring annotation of subgroups in the training data. An adversarial network learns a weight for each training sample in the batch so that the main learner is forced to focus on poorly performing instances. Using a min-max optimization algorithm, this method improves overall speaker verification fairness. We present three different ARWformulations: accumulated pairwise similarity, pseudo-labeling, and pairwise weighting, and measure their performance in terms of equal error rate (EER) on the VoxCeleb corpus. Results show that the pairwise weighting method can achieve 1.08% overall EER, 1.25% for male and 0.67% for female speakers, with relative EER reductions of 7.7%, 10.1% and 3.0%, respectively. For nationality subgroups, the proposed algorithm showed 1.04% EER for US speakers, 0.76% for UK speakers, and 1.22% for all others. The absolute EER gap between gender groups was reduced from 0.70% to 0.58%, while the standard deviation over nationality groups decreased from 0.21 to 0.19.

Adversarial Reweighting for Speaker Verification Fairness

TL;DR

The paper tackles fairness in speaker verification by introducing adversarial reweighting (ARW) in a metric-learning setting. A min–max objective drives an adversary to upweight underperforming samples, guiding the learner to improve hard cases without relying on explicit subgroup annotations. Three ARW formulations—accumulated pairwise similarity (APS), pseudo-labeling with K-means (PL), and pairwise weighting (PW)—are developed and evaluated on VoxCeleb, with PW yielding the best overall EER of and reducing the gender gap from to , while also lowering group std dev for nationality. The results demonstrate substantial fairness gains alongside improved overall performance, suggesting ARW as a practical, annotation-free approach and a basis for extending to other speech tasks such as emotion recognition or medical diagnostics.

Abstract

We address performance fairness for speaker verification using the adversarial reweighting (ARW) method. ARW is reformulated for speaker verification with metric learning, and shown to improve results across different subgroups of gender and nationality, without requiring annotation of subgroups in the training data. An adversarial network learns a weight for each training sample in the batch so that the main learner is forced to focus on poorly performing instances. Using a min-max optimization algorithm, this method improves overall speaker verification fairness. We present three different ARWformulations: accumulated pairwise similarity, pseudo-labeling, and pairwise weighting, and measure their performance in terms of equal error rate (EER) on the VoxCeleb corpus. Results show that the pairwise weighting method can achieve 1.08% overall EER, 1.25% for male and 0.67% for female speakers, with relative EER reductions of 7.7%, 10.1% and 3.0%, respectively. For nationality subgroups, the proposed algorithm showed 1.04% EER for US speakers, 0.76% for UK speakers, and 1.22% for all others. The absolute EER gap between gender groups was reduced from 0.70% to 0.58%, while the standard deviation over nationality groups decreased from 0.21 to 0.19.
Paper Structure (11 sections, 16 equations, 6 figures, 2 tables)

This paper contains 11 sections, 16 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Representation bias in the training data, where each group is colored differently. For binary classification of two different classes of circle and cross, the bottom has better fairness than the top by considering the minority group, green.
  • Figure 2: ARW for classification. For inference, only the learner inside the dashed box is used.
  • Figure 3: ARW for speaker verification. We define the weight $\lambda_{\phi}$ differently for APS, PL, and PW.
  • Figure 4: Computing ARW weights $\lambda_{\phi}(j, \{\forall_{k}\mathbf{x}_k^{a}\})$ for APS.
  • Figure 5: Computing weights $\lambda^K_{\phi}(\mathbf{x}_j^{a})$ for PL.
  • ...and 1 more figures