Table of Contents
Fetching ...

Score Normalization for Demographic Fairness in Face Recognition

Yu Linghu, Tiago de Freitas Pereira, Christophe Ecabert, Sébastien Marcel, Manuel Günther

TL;DR

This work tackles demographic fairness in face recognition by shifting from model retraining to post-processing score normalization. It introduces a taxonomy of nine techniques (M1–M5 and FSN) that leverage impostor and genuine score distributions, ranging from identity-based to pure cohort-based calibrations, including Platt scaling and a Bayesian-inspired approach. The methods are evaluated on two datasets (VGGFace2 and RFW) across six protocols and five pre-trained networks, showing consistent fairness gains at low false-match rates (e.g., around $\tau=10^{-3}$) without degrading verification performance, with impostor- and cohort-based approaches (notably M1–M3) delivering robust debiasing. A key finding is that balancing FMR and FNMR contributions matters for maximal gains, underscoring the need to jointly consider both error types in fairness evaluation and suggesting future work on tail-distribution modeling.

Abstract

Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available.

Score Normalization for Demographic Fairness in Face Recognition

TL;DR

This work tackles demographic fairness in face recognition by shifting from model retraining to post-processing score normalization. It introduces a taxonomy of nine techniques (M1–M5 and FSN) that leverage impostor and genuine score distributions, ranging from identity-based to pure cohort-based calibrations, including Platt scaling and a Bayesian-inspired approach. The methods are evaluated on two datasets (VGGFace2 and RFW) across six protocols and five pre-trained networks, showing consistent fairness gains at low false-match rates (e.g., around ) without degrading verification performance, with impostor- and cohort-based approaches (notably M1–M3) delivering robust debiasing. A key finding is that balancing FMR and FNMR contributions matters for maximal gains, underscoring the need to jointly consider both error types in fairness evaluation and suggesting future work on tail-distribution modeling.

Abstract

Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available.
Paper Structure (15 sections, 5 equations, 4 figures, 4 tables)

This paper contains 15 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Improved Fairness Through Score Normalization. The original scores on the left have different False Match Rates (FMR, red area) and False Non-Match Rates (FNMR, green area) for different demographics under the same score threshold. Through modeling of score distributions from a cohort, we normalize scores such that they provide more similar FMR and FNMR across demographics, thereby improving demographic fairness. Normalization techniques in red text use cohort impostor scores only, blue ones also incorporate cohort genuine scores.
  • Figure 2: RFW Protocol Comparison. This figure displays distributions of baseline genuine and impostor scores of the four ethnicities on original and our random RFW protocol computed with the E2 network (cf. Tab. \ref{['tab:network']}).
  • Figure 3: Impostor vs ALL. This figure compares the VGG-Face2 ethnicity score distributions of baseline, impostor-based method M1.1, and impostor-genuine-based methods M4 and M5. Features are extracted by E3.
  • Figure 4: Distribution of $\delta$. This figure exhibits the distribution of FMR and FNMR contribution difference $\delta$ with respect to the baseline and each method.