A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

Pedro Vidal; Bernardo Biesseck; Luiz E. L. Coelho; Roger Granada; David Menotti

A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

Pedro Vidal, Bernardo Biesseck, Luiz E. L. Coelho, Roger Granada, David Menotti

TL;DR

This paper addresses privacy, bias, and data scarcity challenges in facial recognition by comparing state-of-the-art synthetic facial data generation methods. It covers GAN-, diffusion-, and 3D-rendering–based approaches, evaluating their ability to capture realistic pose, lighting, aging, and occlusion variations across eight benchmarks. The key finding is that diffusion-based methods (e.g., Arc2Face, DCFace) significantly narrow the synthetic-to-real performance gap, though some methods still lag behind large real datasets like WebFace4M, especially on in-the-wild benchmarks. The work highlights ongoing issues such as demographic representation, computational cost, and the need for robust, standardized evaluation to guide ethical deployment of synthetic data for FR training.

Abstract

Facial recognition has become a widely used method for authentication and identification, with applications for secure access and locating missing persons. Its success is largely attributed to deep learning, which leverages large datasets and effective loss functions to learn discriminative features. Despite these advances, facial recognition still faces challenges in explainability, demographic bias, privacy, and robustness to aging, pose variations, lighting changes, occlusions, and facial expressions. Privacy regulations have also led to the degradation of several datasets, raising legal, ethical, and privacy concerns. Synthetic facial data generation has been proposed as a promising solution. It mitigates privacy issues, enables experimentation with controlled facial attributes, alleviates demographic bias, and provides supplementary data to improve models trained on real data. This study compares the effectiveness of synthetic facial datasets generated using different techniques in facial recognition tasks. We evaluate accuracy, rank-1, rank-5, and the true positive rate at a false positive rate of 0.01% on eight leading datasets, offering a comparative analysis not extensively explored in the literature. Results demonstrate the ability of synthetic data to capture realistic variations while emphasizing the need for further research to close the performance gap with real data. Techniques such as diffusion models, GANs, and 3D models show substantial progress; however, challenges remain.

A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

TL;DR

Abstract

A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)