Table of Contents
Fetching ...

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Yangyang Qu, Todisco Massimiliano, Galdi Chiara, Evans Nicholas

Abstract

Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut learning, where speaker classification training exploits spurious correlations between sex and speaker identity, and (ii) feature entanglement, where sex-linked acoustic variation overlaps with identity cues and cannot be removed without degrading speaker discrimination. We propose Fair-Gate, a fairness-aware and interpretable risk-gating framework that addresses both mechanisms in a single pipeline. Fair-Gate applies risk extrapolation to reduce variation in speaker-classification risk across proxy sex groups, and introduces a local complementary gate that routes intermediate features into an identity branch and a sex branch. The gate provides interpretability by producing an explicit routing mask that can be inspected to understand which features are allocated to identity versus sex-related pathways. Experiments on VoxCeleb1 show that Fair-Gate improves the utility--fairness trade-off, yielding more sex-fair ASV performance under challenging evaluation conditions.

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Abstract

Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut learning, where speaker classification training exploits spurious correlations between sex and speaker identity, and (ii) feature entanglement, where sex-linked acoustic variation overlaps with identity cues and cannot be removed without degrading speaker discrimination. We propose Fair-Gate, a fairness-aware and interpretable risk-gating framework that addresses both mechanisms in a single pipeline. Fair-Gate applies risk extrapolation to reduce variation in speaker-classification risk across proxy sex groups, and introduces a local complementary gate that routes intermediate features into an identity branch and a sex branch. The gate provides interpretability by producing an explicit routing mask that can be inspected to understand which features are allocated to identity versus sex-related pathways. Experiments on VoxCeleb1 show that Fair-Gate improves the utility--fairness trade-off, yielding more sex-fair ASV performance under challenging evaluation conditions.
Paper Structure (22 sections, 16 equations, 2 figures, 3 tables)

This paper contains 22 sections, 16 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Desired decision vs. demographic shortcut in speaker verification under a shared threshold. A verifier should base its decision on identity evidence by comparing enrollment and test utterances (solid arrow). However, because sex affects acoustics (e.g., $F_0$ and formant structure) and can be spuriously correlated with speaker identity in the training data, the model may also exploit sex-linked cues as a shortcut (dashed arrow). Such shortcut reliance can shift score distributions differently for male and female speakers, leading to subgroup error-rate gaps when deploying a single global decision threshold.
  • Figure 2: Overview of Fair-Gate. The encoder produces frame-level features $\mathbf{U}$, which are complementarily soft-routed by a local mask $A$ (gate) into an identity branch and a sex branch. The identity branch produces the embedding $z_{\mathrm{id}}$, which is the only representation used for automatic speaker verification (ASV) at inference. During training, the sex branch learns a sex embedding $z_{\mathrm{sex}}$ and predicts proxy sex labels $\hat{s}$ via a sex classifier ($\mathcal{L}_{\mathrm{sex}}$). The identity branch is optimized for speaker classification ($\mathcal{L}_{\mathrm{spk}}$), regularized by Risk Extrapolation (REx) across proxy sex groups ($\mathcal{L}_{\mathrm{rex}}$), and constrained by an adversarial sex classifier implemented through a Gradient Reversal Layer (GRL) ($\mathcal{L}_{\mathrm{adv}}$). A decorrelation loss ($\mathcal{L}_{\mathrm{decor}}$) encourages separation between $z_{\mathrm{id}}$ and $z_{\mathrm{sex}}$, while gate regularizers ($\mathcal{L}_{\mathrm{cap}}$, $\mathcal{L}_{\mathrm{sat}}$) prevent degenerate routing.