Table of Contents
Fetching ...

Synthetic Data for the Mitigation of Demographic Biases in Face Recognition

Pietro Melzi, Christian Rathgeb, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Dominik Lawatsch, Florian Domin, Maxim Schaubert

TL;DR

The paper tackles demographic bias in face recognition by leveraging synthetic data generated with GANDiffFace to fine-tune state-of-the-art systems. It evaluates ArcFace and CosFace on real-world datasets (DiveFace and RFW) after training with synthetic datasets (Syn-Asian and Syn-Mixed) to assess fairness via FDR, IR, and GARBE. The results show model-dependent effects: ArcFace benefits from Syn-Mixed while Syn-Asian can impair fairness, whereas CosFace gains from Syn-Asian with limited benefits from Syn-Mixed. The work demonstrates that synthetic data can reduce demographic disparities while preserving performance, suggesting synthetic-data benchmarking and augmentation as a promising direction for fairer face recognition systems.

Abstract

This study investigates the possibility of mitigating the demographic biases that affect face recognition technologies through the use of synthetic data. Demographic biases have the potential to impact individuals from specific demographic groups, and can be identified by observing disparate performance of face recognition systems across demographic groups. They primarily arise from the unequal representations of demographic groups in the training data. In recent times, synthetic data have emerged as a solution to some problems that affect face recognition systems. In particular, during the generation process it is possible to specify the desired demographic and facial attributes of images, in order to control the demographic distribution of the synthesized dataset, and fairly represent the different demographic groups. We propose to fine-tune with synthetic data existing face recognition systems that present some demographic biases. We use synthetic datasets generated with GANDiffFace, a novel framework able to synthesize datasets for face recognition with controllable demographic distribution and realistic intra-class variations. We consider multiple datasets representing different demographic groups for training and evaluation. Also, we fine-tune different face recognition systems, and evaluate their demographic fairness with different metrics. Our results support the proposed approach and the use of synthetic data to mitigate demographic biases in face recognition.

Synthetic Data for the Mitigation of Demographic Biases in Face Recognition

TL;DR

The paper tackles demographic bias in face recognition by leveraging synthetic data generated with GANDiffFace to fine-tune state-of-the-art systems. It evaluates ArcFace and CosFace on real-world datasets (DiveFace and RFW) after training with synthetic datasets (Syn-Asian and Syn-Mixed) to assess fairness via FDR, IR, and GARBE. The results show model-dependent effects: ArcFace benefits from Syn-Mixed while Syn-Asian can impair fairness, whereas CosFace gains from Syn-Asian with limited benefits from Syn-Mixed. The work demonstrates that synthetic data can reduce demographic disparities while preserving performance, suggesting synthetic-data benchmarking and augmentation as a promising direction for fairer face recognition systems.

Abstract

This study investigates the possibility of mitigating the demographic biases that affect face recognition technologies through the use of synthetic data. Demographic biases have the potential to impact individuals from specific demographic groups, and can be identified by observing disparate performance of face recognition systems across demographic groups. They primarily arise from the unequal representations of demographic groups in the training data. In recent times, synthetic data have emerged as a solution to some problems that affect face recognition systems. In particular, during the generation process it is possible to specify the desired demographic and facial attributes of images, in order to control the demographic distribution of the synthesized dataset, and fairly represent the different demographic groups. We propose to fine-tune with synthetic data existing face recognition systems that present some demographic biases. We use synthetic datasets generated with GANDiffFace, a novel framework able to synthesize datasets for face recognition with controllable demographic distribution and realistic intra-class variations. We consider multiple datasets representing different demographic groups for training and evaluation. Also, we fine-tune different face recognition systems, and evaluate their demographic fairness with different metrics. Our results support the proposed approach and the use of synthetic data to mitigate demographic biases in face recognition.
Paper Structure (18 sections, 9 equations, 3 figures, 6 tables)

This paper contains 18 sections, 9 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the proposed approach: Face Recognition (FR) systems with low demographic fairness are fine-tuned with demographic-specific or demographic-mixed datasets synthesized with GANDiffFace melzi2023gandiffface. Real-world datasets representing different demographic groups (e.g.,$d_1$ = African, $d_2$ = Asian, $d_3$ = Caucasian) are used to assess whether fairness increases in the fine-tuned FR systems.
  • Figure 2: Identities (one for each row) and intra-class variations generated with GANDiffFace melzi2023gandiffface for different demographic groups.
  • Figure 3: Overview of GANDiffFace melzi2023gandiffface based on the combination of GAN and Diffusion models. GANDiffFace creates synthetic datasets for face recognition with the properties listed in blue.