Table of Contents
Fetching ...

ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

Shen Li, Jianqing Xu, Jiaying Wu, Miao Xiong, Ailin Deng, Jiazhen Ji, Yuge Huang, Wenjie Feng, Shouhong Ding, Bryan Hooi

TL;DR

A diffusion-fueled SFR model that employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances is introduced, and it is shown that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data.

Abstract

Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner. Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces. To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation). Inspired by these goals, we introduce a diffusion-fueled SFR model termed $\text{ID}^3$. $\text{ID}^3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances. Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data. This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces. Extensive experiments across five challenging benchmarks validate the advantages of $\text{ID}^3$.

ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

TL;DR

A diffusion-fueled SFR model that employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances is introduced, and it is shown that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data.

Abstract

Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner. Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces. To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation). Inspired by these goals, we introduce a diffusion-fueled SFR model termed . employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances. Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data. This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces. Extensive experiments across five challenging benchmarks validate the advantages of .
Paper Structure (26 sections, 2 theorems, 34 equations, 8 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 2 theorems, 34 equations, 8 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

Minimizing $\mathcal{L}$ with regard to $\boldsymbol{\theta}$ is equivalent to minimizing the upper bound of an adjusted conditional data negative log-likelihood $-\log \Tilde{p}(\mathbf{x} | \mathbf{y}, \mathbf{s})$, i.e.: where

Figures (8)

  • Figure 1: The forward pass of $\text{ID}^3$ in terms of loss computation. Given an image, its face attributes, and its face embedding, $\text{ID}^3$ obtains the image's noised version after $t$ diffusion steps and employs a denoising network to denoise it. This denoising process is conditioned on the predicted attributes and the ID embedding. Optimization proceeds by minimizing a loss function comprised of a denoising term, a one-step reconstruction term, an inner-product term, and a constant.
  • Figure 2: Synthetic Dataset Generation
  • Figure 3: Uncurated samples generated by $\text{ID}^3$ (Top) and those by IDiff-Face (Bottom).
  • Figure : Training Algorithm
  • Figure A.1: An illustration of the dataset-generating algorithm.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Theorem 3.1
  • proof
  • Lemma A.1
  • proof
  • proof