Table of Contents
Fetching ...

UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition

Xiao Lin, Yuge Huang, Jianqing Xu, Yuxi Mi, Shuigeng Zhou, Shouhong Ding

TL;DR

UIFace addresses privacy and labeling concerns in real face datasets by proposing a diffusion-based synthetic data generator that enhances intra-class diversity while preserving identity. It achieves this via a two-stage sampling strategy that first uses a learnable empty context $c_e$ to induce variation and then employs a fixed identity context $c$ to recover identity, guided by an adaptive boundary $t_0$; an attention-injection module further leverages unconditional generation to enrich diversity. Empirical results show UIFace outperforms prior synthetic-face methods with fewer synthetic identities and can reach parity with FR models trained on real data as synthetic identities increase. This approach reduces reliance on real data while delivering diverse, high-quality synthetic faces for robust FR training.

Abstract

Face recognition (FR) stands as one of the most crucial applications in computer vision. The accuracy of FR models has significantly improved in recent years due to the availability of large-scale human face datasets. However, directly using these datasets can inevitably lead to privacy and legal problems. Generating synthetic data to train FR models is a feasible solution to circumvent these issues. While existing synthetic-based face recognition methods have made significant progress in generating identity-preserving images, they are severely plagued by context overfitting, resulting in a lack of intra-class diversity of generated images and poor face recognition performance. In this paper, we propose a framework to Unleash Inherent capability of the model to enhance intra-class diversity for synthetic face recognition, shortened as UIFace. Our framework first trains a diffusion model that can perform sampling conditioned on either identity contexts or a learnable empty context. The former generates identity-preserving images but lacks variations, while the latter exploits the model's intrinsic ability to synthesize intra-class-diversified images but with random identities. Then we adopt a novel two-stage sampling strategy during inference to fully leverage the strengths of both types of contexts, resulting in images that are diverse as well as identitypreserving. Moreover, an attention injection module is introduced to further augment the intra-class variations by utilizing attention maps from the empty context to guide the sampling process in ID-conditioned generation. Experiments show that our method significantly surpasses previous approaches with even less training data and half the size of synthetic dataset. The proposed UIFace even achieves comparable performance with FR models trained on real datasets when we further increase the number of synthetic identities.

UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition

TL;DR

UIFace addresses privacy and labeling concerns in real face datasets by proposing a diffusion-based synthetic data generator that enhances intra-class diversity while preserving identity. It achieves this via a two-stage sampling strategy that first uses a learnable empty context to induce variation and then employs a fixed identity context to recover identity, guided by an adaptive boundary ; an attention-injection module further leverages unconditional generation to enrich diversity. Empirical results show UIFace outperforms prior synthetic-face methods with fewer synthetic identities and can reach parity with FR models trained on real data as synthetic identities increase. This approach reduces reliance on real data while delivering diverse, high-quality synthetic faces for robust FR training.

Abstract

Face recognition (FR) stands as one of the most crucial applications in computer vision. The accuracy of FR models has significantly improved in recent years due to the availability of large-scale human face datasets. However, directly using these datasets can inevitably lead to privacy and legal problems. Generating synthetic data to train FR models is a feasible solution to circumvent these issues. While existing synthetic-based face recognition methods have made significant progress in generating identity-preserving images, they are severely plagued by context overfitting, resulting in a lack of intra-class diversity of generated images and poor face recognition performance. In this paper, we propose a framework to Unleash Inherent capability of the model to enhance intra-class diversity for synthetic face recognition, shortened as UIFace. Our framework first trains a diffusion model that can perform sampling conditioned on either identity contexts or a learnable empty context. The former generates identity-preserving images but lacks variations, while the latter exploits the model's intrinsic ability to synthesize intra-class-diversified images but with random identities. Then we adopt a novel two-stage sampling strategy during inference to fully leverage the strengths of both types of contexts, resulting in images that are diverse as well as identitypreserving. Moreover, an attention injection module is introduced to further augment the intra-class variations by utilizing attention maps from the empty context to guide the sampling process in ID-conditioned generation. Experiments show that our method significantly surpasses previous approaches with even less training data and half the size of synthetic dataset. The proposed UIFace even achieves comparable performance with FR models trained on real datasets when we further increase the number of synthetic identities.

Paper Structure

This paper contains 18 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: a) Visualization of samples from different datasets. Samples from the real face dataset CASIA exhibit variations in attributes such as pose, expression and illumination. However, when we synthesize face data using previous method IDiff-Face boutros2023idiff, the generated images show poor diversity, more specifically, similar expressions and poses, which is caused by identity overfitting. In contrast, our method can generate a wider variety of images, thereby enhancing the accuracy of the trained FR model. b) Quantitative comparison of dataset diversity. We apply LPIPS and Improved Recall to measure the diversity of different datasets. A higher value indicates better diversity. c) Quantitative comparison of final accuracy of the FR models.
  • Figure 2: Effects of different sampling timesteps on identity. The x-axis represents timestep intervals where identity contexts are used as conditions. The empty context is used as a substitute in timesteps that not covered in intervals. The y-axis represents the intra-class similarity of the generated face images. The maximum sampling timestep $T$ is set to 1000.
  • Figure 3: Overview of proposed UIFace. We propose a two-stage sampling strategy to unleash the intrinsic capability of the model to generate diverse images. In the first stage, the model performs unconditional generation based on the empty context $c_e$. In the second stage, the model restores identity-relevant details conditioned on specific identity contexts $c$. We further propose an adaptive stage partition strategy to determine the boundary of these two stages $t_0$ and an attention injection module to enhance diversity of synthetic dataset while maintaining identities.
  • Figure 4: visualization results of different attention injection strategies (Top: unconditional generation; Middle: attention injection with vanilla replacement; Bottom: the proposed attention inject).
  • Figure 5: Genuine and imposter comparisons. (Left: baseline; Mid: UIFace; Right: CASIA).
  • ...and 1 more figures