Table of Contents
Fetching ...

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim

TL;DR

VIGFace tackles privacy concerns in face recognition by pre-assigning virtual identities in the FR feature space and training a diffusion-based generator to produce authentic-looking faces conditioned on five-point landmarks. By keeping virtual prototypes orthogonal to real ones and jointly optimizing with ArcFace losses, the method achieves strong separability while preventing identity leakage. The approach demonstrates that synthetic, privacy-free data can substitute real datasets and also serve as a valuable augmentation to boost FR performance, attaining state-of-the-art results on multiple benchmarks. The work further provides a practical dataset release of virtual identities to help mitigate portrait-right issues in FR research.

Abstract

Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Our idea originates from pre-assigning virtual identities in the feature space. Initially, we train the face recognition model using a real face dataset and create a feature space for both real and virtual identities, where virtual prototypes are orthogonal to other prototypes. Subsequently, we train the diffusion model based on the established feature space, enabling it to generate authentic human face images from real prototypes and synthesize virtual face images from virtual prototypes. Our proposed framework provides two significant benefits. Firstly, it shows clear separability between existing individuals and virtual face images, allowing one to create synthetic images with confidence and without concerns about privacy and portrait rights. Secondly, it ensures improved performance through data augmentation by incorporating real existing images. Extensive experiments demonstrate the superiority of our virtual face dataset and framework, outperforming the previous state-of-the-art on various face recognition benchmarks.

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition

TL;DR

VIGFace tackles privacy concerns in face recognition by pre-assigning virtual identities in the FR feature space and training a diffusion-based generator to produce authentic-looking faces conditioned on five-point landmarks. By keeping virtual prototypes orthogonal to real ones and jointly optimizing with ArcFace losses, the method achieves strong separability while preventing identity leakage. The approach demonstrates that synthetic, privacy-free data can substitute real datasets and also serve as a valuable augmentation to boost FR performance, attaining state-of-the-art results on multiple benchmarks. The work further provides a practical dataset release of virtual identities to help mitigate portrait-right issues in FR research.

Abstract

Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Our idea originates from pre-assigning virtual identities in the feature space. Initially, we train the face recognition model using a real face dataset and create a feature space for both real and virtual identities, where virtual prototypes are orthogonal to other prototypes. Subsequently, we train the diffusion model based on the established feature space, enabling it to generate authentic human face images from real prototypes and synthesize virtual face images from virtual prototypes. Our proposed framework provides two significant benefits. Firstly, it shows clear separability between existing individuals and virtual face images, allowing one to create synthetic images with confidence and without concerns about privacy and portrait rights. Secondly, it ensures improved performance through data augmentation by incorporating real existing images. Extensive experiments demonstrate the superiority of our virtual face dataset and framework, outperforming the previous state-of-the-art on various face recognition benchmarks.
Paper Structure (33 sections, 12 equations, 20 figures, 5 tables)

This paper contains 33 sections, 12 equations, 20 figures, 5 tables.

Figures (20)

  • Figure 1: T-distributed Stochastic Neighbor Embedding (T-SNE) van2008visualizing plot of embeddings from real and synthesized images. The filled and lined stars represent the real and virtual prototypes, while filled and lined circles indicate the embeddings of real and synthesized images, respectively. The bottom of the figure shows the face images included in the cluster, and the dotted outlined images represent the face images generated using our method.
  • Figure 2: Pipeline for the proposed method. Conventional FR training includes prototypes for only real individuals, indicated as $W_{R}$. We add $k$ prototypes for virtual IDs, denoted as $W_{V}$. The virtual embedding $f'_{\mathrm{FR}}(x_{j})$ corresponding to the virtual person ID: $j$ is generated to follow distribution of the real embeddings. To synthesize the facial image from virtual prototypes, we adopt the DiT architecture peebles2023scalable, following the design approach of the Vision Transformer (ViT) dosovitskiy2010image. Additionally, we adjust the DiT model to utilize 5-point landmark images to handle pose variations.
  • Figure 3: Changes in the similarity matrix of the prototypes from our method. Similarity values were min-max normalized.
  • Figure 4: Virtual face generated in VIGFace (B). Each row lists facial images in different pose environments created using five facial landmarks. Our method can generate various conditions of face images, such as illumination, occlusion by accessories, and facial expressions, while controlling the pose variations of the face images.
  • Figure 5: Comparison of the generated virtual ID images with the conventional methods qiu2021synfacebae2023digifacekim2023dcfaceboutros2023idiffmelzi2023gandifffacesun2025cemifacewu2024vec2face and our methods that are trained on CASIA-WebFace. For each synthetic dataset, we present two subjects in a single row.
  • ...and 15 more figures