Table of Contents
Fetching ...

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

Parsa Rahimi, Damien Teney, Sebastien Marcel

TL;DR

AugGen tackles privacy-sensitive face recognition by training a self-contained, class-conditional diffusion model on the target dataset and generating mixed-identity samples to augment the discriminator's training. It introduces a principled class-mixing scheme, forming new class conditions $\mathbf{c}^{*} = \alpha \mathbf{c}^{i} + \beta \mathbf{c}^{j}$ and selecting $\alpha,\beta$ via grid search to maximize a combined dissimilarity and similarity objective, yielding $\mathrm{D}^{aug}$. The discriminator trained on the mix of real and augmented data exhibits tighter intra-class compactness and stronger inter-class separation, delivering 1–12% gains across 8 benchmarks and often rivaling architectural improvements while using less real data. The results demonstrate that carefully integrated synthetic data can mitigate privacy concerns and meaningfully boost FR performance, though the approach relies on substantial upfront computation and highlights the need for better generative proxy metrics for downstream tasks.

Abstract

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative; however, most existing methods depend heavily on external datasets or pre-trained models, increasing complexity and resource demands. In this paper, we introduce AugGen, a self-contained synthetic augmentation technique. AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset, eliminating the need for external resources. Evaluated across 8 FR benchmarks, including IJB-C and IJB-B, our method achieves 1-12% performance improvements, outperforming models trained solely on real data and surpassing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate privacy constraints and substantially enhance recognition performance. Paper website: https://parsa-ra.github.io/auggen/.

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

TL;DR

AugGen tackles privacy-sensitive face recognition by training a self-contained, class-conditional diffusion model on the target dataset and generating mixed-identity samples to augment the discriminator's training. It introduces a principled class-mixing scheme, forming new class conditions and selecting via grid search to maximize a combined dissimilarity and similarity objective, yielding . The discriminator trained on the mix of real and augmented data exhibits tighter intra-class compactness and stronger inter-class separation, delivering 1–12% gains across 8 benchmarks and often rivaling architectural improvements while using less real data. The results demonstrate that carefully integrated synthetic data can mitigate privacy concerns and meaningfully boost FR performance, though the approach relies on substantial upfront computation and highlights the need for better generative proxy metrics for downstream tasks.

Abstract

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative; however, most existing methods depend heavily on external datasets or pre-trained models, increasing complexity and resource demands. In this paper, we introduce AugGen, a self-contained synthetic augmentation technique. AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset, eliminating the need for external resources. Evaluated across 8 FR benchmarks, including IJB-C and IJB-B, our method achieves 1-12% performance improvements, outperforming models trained solely on real data and surpassing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate privacy constraints and substantially enhance recognition performance. Paper website: https://parsa-ra.github.io/auggen/.

Paper Structure

This paper contains 37 sections, 4 equations, 14 figures, 12 tables, 2 algorithms.

Figures (14)

  • Figure 1: Core idea of AugGen. AugGen boosts the model’s overall discriminative capabilities without requiring external datasets or pre-trained networks. To achieve this, we propose a novel sampling strategy using a conditional diffusion model—trained exclusively on the discriminator’s original data—this enables the generation of synthetic “mixes” of source classes. Incorporating these synthetic samples into the discriminator’s training, results in higher intra-class compactness and greater inter-class separation ($\theta_{\mathrm{ours}} > \theta_{\mathrm{baseline}}$) than models trained solely on the original data.
  • Figure 2: Unlike prior methods that depend on external data or pretrained generators, our self-contained synthetic augmentation framework improves recognition purely through its own generative process.
  • Figure 3: Overview diagram of AugGen: (a) A labeled dataset, $\mathrm{D}^{\mathrm{orig}}$, is used to train a class-conditional generator, $G({\bm{\mathsfit{Z}}}, {\bm{c}})$, and a discriminative model, $\mathrm{M}_{\mathrm{orig}}$. (b,d) Reproduced dataset, $\mathrm{D}^{\mathrm{repro}}$, closely mimics $\mathrm{D}^{\mathrm{orig}}$ under the original conditions. (c) We find new condition vectors, ${\bm{C}}^{*}$, to generate an augmented dataset, $\mathrm{D}^{\mathrm{aug}}$, using the generator. (f) Augmenting $\mathrm{D}^{\mathrm{orig}}$ with $\mathrm{D}^{\mathrm{aug}}$ boosts $\mathrm{M}{\mathrm{orig}}$ performance without auxiliary datasets or models.
  • Figure 4: Randomly sampled images. From left to right: The first column shows variations of a randomly selected identity (ID 1) from $\mathrm{D}^\mathrm{orig}$. The second column presents the reproduction of the same ID using the generator, conditioned on the corresponding one-hot vector $G({\bm{\mathsfit{Z}}}, {\bm{c}}_1)$. The third and fourth columns follow the same process for a different ID, with the middle column representing a newly synthesized identity generated by conditioning the generator on $G({\bm{\mathsfit{Z}}}, {\bm{c}}^{*})$. The samples above the red line are from CASIA-WebFace, while the lower part corresponds to WebFace160K.
  • Figure 5: The value of the proposed measure $m^{\mathrm{total}}$ for setting the candidate values of $\alpha$ (x axis) and $\beta$ (y axis). Here for each $\alpha$ and $\beta$ and our 100 combination of ${\mathbb{L}}_{s}$ we calculated the $m^{\mathrm{total}}$ by setting the $K$ in \ref{['alg:grid_search']} to 10.
  • ...and 9 more figures