Table of Contents
Fetching ...

CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition

Zhonglin Sun, Siyang Song, Ioannis Patras, Georgios Tzimiropoulos

TL;DR

This paper proposes a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation (CemiFace) which produces facial samples with various levels of similarity to the subject center, thus allowing to generate face datasets containing effective discriminative samples for training face recognition.

Abstract

Privacy issue is a main concern in developing face recognition techniques. Although synthetic face images can partially mitigate potential legal risks while maintaining effective face recognition (FR) performance, FR models trained by face images synthesized by existing generative approaches frequently suffer from performance degradation problems due to the insufficient discriminative quality of these synthesized samples. In this paper, we systematically investigate what contributes to solid face recognition model training, and reveal that face images with certain degree of similarities to their identity centers show great effectiveness in the performance of trained FR models. Inspired by this, we propose a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation (CemiFace)) which produces facial samples with various levels of similarity to the subject center, thus allowing to generate face datasets containing effective discriminative samples for training face recognition. Experimental results show that with a modest degree of similarity, training on the generated dataset can produce competitive performance compared to previous generation methods.

CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition

TL;DR

This paper proposes a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation (CemiFace) which produces facial samples with various levels of similarity to the subject center, thus allowing to generate face datasets containing effective discriminative samples for training face recognition.

Abstract

Privacy issue is a main concern in developing face recognition techniques. Although synthetic face images can partially mitigate potential legal risks while maintaining effective face recognition (FR) performance, FR models trained by face images synthesized by existing generative approaches frequently suffer from performance degradation problems due to the insufficient discriminative quality of these synthesized samples. In this paper, we systematically investigate what contributes to solid face recognition model training, and reveal that face images with certain degree of similarities to their identity centers show great effectiveness in the performance of trained FR models. Inspired by this, we propose a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation (CemiFace)) which produces facial samples with various levels of similarity to the subject center, thus allowing to generate face datasets containing effective discriminative samples for training face recognition. Experimental results show that with a modest degree of similarity, training on the generated dataset can produce competitive performance compared to previous generation methods.
Paper Structure (40 sections, 14 equations, 9 figures, 15 tables, 2 algorithms)

This paper contains 40 sections, 14 equations, 9 figures, 15 tables, 2 algorithms.

Figures (9)

  • Figure 1: Visualization of the samples with different similarities. Given an inquiry image, it can form a hypersphere based on similarity to the inquiry image, where samples with the same similarity share the same radius. Samples with similarities between 0 to 1 with an interval of 0.33 are shown. With our proposed CemiFace, each inquiry image finally forms a novel subject.
  • Figure 2: Samples with different similarity groups from CASIA-WebFace dataset. From left to right are samples with lower similarity to the identity center
  • Figure 3: Illustration of our proposed method. The left part is the training framework for learning images with various levels of similarity. Firstly noise is added to the clean facial image before it is processed by the diffusion model. Then similarity controlling condition $\mathbf{m}$ ranging between [-1,1] with facial embedding is injected to guide the generation. Consequently, the model outputs the estimated noise, which is adopted to calculate the estimated image. We add similarity matching loss $L_{\mathbf{SimMat}}$ between the estimated image and the input image. For generation, we gradually denoise a noising image with time step scaling from $\mathbf{T}$ to 0, conditions for identity and similarity are left fixed. The two diffusion models in the generation part mean the same diffusion model at two different time steps. The right bottom part is the details of using cross-attention to inject similarity condition and facial embedding into the diffusion models
  • Figure 4: Accuracy of samples with different similarity varying from -1 to 1. The left figure is the specific performance on each evaluation dataset. The right figure is the average accuracy of our CemiFace
  • Figure 5: Sample Visualization under different similarity. From left to right are inquiry images, images with m from 1 to -1 and samples generated by DCFace. Different rows in each inquiry group represent the results produced by different noises. The first column are the inquiry images. The yellow dashed box includes samples where we obtain the best accuracy. Pink dashed boxes are samples that vary vastly.
  • ...and 4 more figures