Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models
Anjith George, Sebastien Marcel
TL;DR
The paper tackles the realism gap in synthetic face data for face recognition by introducing Digi2Real, a realism-transfer framework that reuses DigiFace identities and enhances realism through Arc2Face-based generation, CLIP-space alignment with a learned offset $\Delta$, and SLERP-driven intra-class variation. This hybrid approach combines a graphics-rendering pipeline with foundation-model-backed realism to produce Digi2Real-20K, a synthetic dataset that yields substantial improvements over DigiFace and competitive results relative to state-of-the-art synthetic datasets, especially on IJB-B and IJB-C benchmarks. A key finding is that modest real-data augmentation (e.g., 1,000–% identities) can further close the gap between synthetic- and real-data training, suggesting a practical path for privacy-preserving FR systems. The work demonstrates the potential of realism transfer to enable large-scale, controllable synthetic data with strong downstream performance, and it provides public code and data resources to foster further research.
Abstract
The accuracy of face recognition systems has improved significantly in the past few years, thanks to the large amount of data collected and advancements in neural network architectures. However, these large-scale datasets are often collected without explicit consent, raising ethical and privacy concerns. To address this, there have been proposals to use synthetic datasets for training face recognition models. Yet, such models still rely on real data to train the generative models and generally exhibit inferior performance compared to those trained on real datasets. One of these datasets, DigiFace, uses a graphics pipeline to generate different identities and intra-class variations without using real data in model training. However, the performance of this approach is poor on face recognition benchmarks, possibly due to the lack of realism in the images generated by the graphics pipeline. In this work, we introduce a novel framework for realism transfer aimed at enhancing the realism of synthetically generated face images. Our method leverages the large-scale face foundation model, and we adapt the pipeline for realism enhancement. By integrating the controllable aspects of the graphics pipeline with our realism enhancement technique, we generate a large amount of realistic variations, combining the advantages of both approaches. Our empirical evaluations demonstrate that models trained using our enhanced dataset significantly improve the performance of face recognition systems over the baseline. The source code and dataset will be publicly accessible at the following link: https://www.idiap.ch/paper/digi2real
