Table of Contents
Fetching ...

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang, Qiuyang Yuan, Xuan Zhao, Jianqing Xu, Shouhong Ding, ShaoMing Wang, Rizen Guo, Shuigeng Zhou

TL;DR

To create virtual faces, the generator is conditioned on novel identities of unlabeled synthetic faces, and novel styles that are statistically sampled from a real-world prior distribution, which accounts for both intra-subject variation and subject distinctiveness.

Abstract

Identity-preserving face synthesis aims to generate synthetic face images of virtual subjects that can substitute real-world data for training face recognition models. While prior arts strive to create images with consistent identities and diverse styles, they face a trade-off between them. Identifying their limitation of treating style variation as subject-agnostic and observing that real-world persons actually have distinct, subject-specific styles, this paper introduces MorphFace, a diffusion-based face generator. The generator learns fine-grained facial styles, e.g., shape, pose and expression, from the renderings of a 3D morphable model (3DMM). It also learns identities from an off-the-shelf recognition model. To create virtual faces, the generator is conditioned on novel identities of unlabeled synthetic faces, and novel styles that are statistically sampled from a real-world prior distribution. The sampling especially accounts for both intra-subject variation and subject distinctiveness. A context blending strategy is employed to enhance the generator's responsiveness to identity and style conditions. Extensive experiments show that MorphFace outperforms the best prior arts in face recognition efficacy.

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

TL;DR

To create virtual faces, the generator is conditioned on novel identities of unlabeled synthetic faces, and novel styles that are statistically sampled from a real-world prior distribution, which accounts for both intra-subject variation and subject distinctiveness.

Abstract

Identity-preserving face synthesis aims to generate synthetic face images of virtual subjects that can substitute real-world data for training face recognition models. While prior arts strive to create images with consistent identities and diverse styles, they face a trade-off between them. Identifying their limitation of treating style variation as subject-agnostic and observing that real-world persons actually have distinct, subject-specific styles, this paper introduces MorphFace, a diffusion-based face generator. The generator learns fine-grained facial styles, e.g., shape, pose and expression, from the renderings of a 3D morphable model (3DMM). It also learns identities from an off-the-shelf recognition model. To create virtual faces, the generator is conditioned on novel identities of unlabeled synthetic faces, and novel styles that are statistically sampled from a real-world prior distribution. The sampling especially accounts for both intra-subject variation and subject distinctiveness. A context blending strategy is employed to enhance the generator's responsiveness to identity and style conditions. Extensive experiments show that MorphFace outperforms the best prior arts in face recognition efficacy.

Paper Structure

This paper contains 36 sections, 13 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Analyses for identity consistency and style variation across prior arts and our proposed MorphFace. Identity consistency is measured by pairwise cosine similarity and style variation by variances of DECA attributes. Intra-class and inter-class results are represented in red and blue, respectively. Separated curves and a larger shaded area indicate better consistency and variation. Prior arts bear inadequacies in either (a) style variation or (b) identity retention, while (c) MorphFace achieves both goals simultaneously.
  • Figure 2: Pipeline of MorphFace. It uses a pair of style and identity contexts to generate faces with designated identity and diverse style. Style is extracted using DECA 3DMM to provides fine-grained, entirely parametric control. To sample virtual faces, unlabeled synthetic images are used as subject reference, and style is sampled statistically for real-world prior distribution.
  • Figure 3: Sample 3DMM feature maps (here, Lambertian renderings) and their synthetic images. \ref{['subsec:method-train']}: Precise style control and more fine-grained detail can be observed in generated images. \ref{['subsec:method-sample']}: Sampling subject-aware styles create renderings and images with subjective distinctiveness (e.g., illumination).
  • Figure 4: Illustration of style distribution. Regions represent real-world style distributions and diamonds represent samples. (a) Insufficient style variation impairs FR generality. (b) Uniformly sampling styles yields a "mixed" distribution that obscure identity consistency. (c) In our proposed approach, style and identity are both promoted by considering the distinctiveness of subjects.
  • Figure 5: During denoising, LF styles (e.g., pose and shape) are earlier established than HF identity details by the nature of DM. We augment style and identity contexts before and after a shifting timestep $t_0$ via CFG, respectively, to improve their expressiveness.
  • ...and 15 more figures