Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Yangyang Qu; Todisco Massimiliano; Galdi Chiara; Evans Nicholas

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Yangyang Qu, Todisco Massimiliano, Galdi Chiara, Evans Nicholas

Abstract

Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut learning, where speaker classification training exploits spurious correlations between sex and speaker identity, and (ii) feature entanglement, where sex-linked acoustic variation overlaps with identity cues and cannot be removed without degrading speaker discrimination. We propose Fair-Gate, a fairness-aware and interpretable risk-gating framework that addresses both mechanisms in a single pipeline. Fair-Gate applies risk extrapolation to reduce variation in speaker-classification risk across proxy sex groups, and introduces a local complementary gate that routes intermediate features into an identity branch and a sex branch. The gate provides interpretability by producing an explicit routing mask that can be inspected to understand which features are allocated to identity versus sex-related pathways. Experiments on VoxCeleb1 show that Fair-Gate improves the utility--fairness trade-off, yielding more sex-fair ASV performance under challenging evaluation conditions.

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Abstract

Paper Structure (22 sections, 16 equations, 2 figures, 3 tables)

This paper contains 22 sections, 16 equations, 2 figures, 3 tables.

Introduction
Fair-Gate Framework
Encoder
Local Complementary Gating
Mask computation and routing
Gate regularization
Routing mass control
Saturation constraint
Identity and Sex Branches Objectives
Speaker classification
Adversarial constraint
Sex classification
Embedding decorrelation
Risk Variance Equalization
Overall Training Objective
...and 7 more sections

Figures (2)

Figure 1: Desired decision vs. demographic shortcut in speaker verification under a shared threshold. A verifier should base its decision on identity evidence by comparing enrollment and test utterances (solid arrow). However, because sex affects acoustics (e.g., $F_0$ and formant structure) and can be spuriously correlated with speaker identity in the training data, the model may also exploit sex-linked cues as a shortcut (dashed arrow). Such shortcut reliance can shift score distributions differently for male and female speakers, leading to subgroup error-rate gaps when deploying a single global decision threshold.
Figure 2: Overview of Fair-Gate. The encoder produces frame-level features $\mathbf{U}$, which are complementarily soft-routed by a local mask $A$ (gate) into an identity branch and a sex branch. The identity branch produces the embedding $z_{\mathrm{id}}$, which is the only representation used for automatic speaker verification (ASV) at inference. During training, the sex branch learns a sex embedding $z_{\mathrm{sex}}$ and predicts proxy sex labels $\hat{s}$ via a sex classifier ($\mathcal{L}_{\mathrm{sex}}$). The identity branch is optimized for speaker classification ($\mathcal{L}_{\mathrm{spk}}$), regularized by Risk Extrapolation (REx) across proxy sex groups ($\mathcal{L}_{\mathrm{rex}}$), and constrained by an adversarial sex classifier implemented through a Gradient Reversal Layer (GRL) ($\mathcal{L}_{\mathrm{adv}}$). A decorrelation loss ($\mathcal{L}_{\mathrm{decor}}$) encourages separation between $z_{\mathrm{id}}$ and $z_{\mathrm{sex}}$, while gate regularizers ($\mathcal{L}_{\mathrm{cap}}$, $\mathcal{L}_{\mathrm{sat}}$) prevent degenerate routing.

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Abstract

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Authors

Abstract

Table of Contents

Figures (2)