Table of Contents
Fetching ...

ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition

Parsa Rahimi, Sebastien Marcel

TL;DR

ScoreMix introduces a self-contained augmentation strategy for discriminative face recognition that relies solely on the target dataset. It exploits score composition in diffusion models by convexly mixing class-conditioned scores during reverse diffusion, generating hard, on-manifold synthetic samples without external data. Across eight benchmarks, ScoreMix yields up to ~7 percentage points accuracy gains, with robustness to backbone changes and no hyperparameter search, and it provides theoretical insight into the geometry of class-space relationships via alignment metrics like CKA/CKNNA. While offering practical improvements, the approach incurs higher sampling cost and highlights a need for further work on aligning generative and discriminative spaces without sacrificing diversity.

Abstract

Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, current strategies often rely on external foundation models or datasets, whose usage is restricted in many scenarios due to policy or legal constraints. We propose ScoreMix, a self-contained synthetic generation method to produce hard synthetic samples for recognition tasks by leveraging the score compositionality of diffusion models. The approach mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific data augmentation without external resources. We systematically study class-selection strategies and find that mixing classes distant in the discriminator's embedding space yields larger gains, providing up to 3% additional average improvement, compared to selection based on proximity. Interestingly, we observe that condition and embedding spaces are largely uncorrelated under standard alignment metrics, and the generator's condition space has a negligible effect on downstream performance. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points, without hyperparameter search, highlighting both robustness and practicality. Our method provides a simple yet effective way to maximize discriminator performance using only the available dataset, without reliance on third-party resources. Paper website: https://parsa-ra.github.io/scoremix/.

ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition

TL;DR

ScoreMix introduces a self-contained augmentation strategy for discriminative face recognition that relies solely on the target dataset. It exploits score composition in diffusion models by convexly mixing class-conditioned scores during reverse diffusion, generating hard, on-manifold synthetic samples without external data. Across eight benchmarks, ScoreMix yields up to ~7 percentage points accuracy gains, with robustness to backbone changes and no hyperparameter search, and it provides theoretical insight into the geometry of class-space relationships via alignment metrics like CKA/CKNNA. While offering practical improvements, the approach incurs higher sampling cost and highlights a need for further work on aligning generative and discriminative spaces without sacrificing diversity.

Abstract

Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, current strategies often rely on external foundation models or datasets, whose usage is restricted in many scenarios due to policy or legal constraints. We propose ScoreMix, a self-contained synthetic generation method to produce hard synthetic samples for recognition tasks by leveraging the score compositionality of diffusion models. The approach mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific data augmentation without external resources. We systematically study class-selection strategies and find that mixing classes distant in the discriminator's embedding space yields larger gains, providing up to 3% additional average improvement, compared to selection based on proximity. Interestingly, we observe that condition and embedding spaces are largely uncorrelated under standard alignment metrics, and the generator's condition space has a negligible effect on downstream performance. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points, without hyperparameter search, highlighting both robustness and practicality. Our method provides a simple yet effective way to maximize discriminator performance using only the available dataset, without reliance on third-party resources. Paper website: https://parsa-ra.github.io/scoremix/.

Paper Structure

This paper contains 60 sections, 5 theorems, 35 equations, 16 figures, 5 tables, 2 algorithms.

Key Result

Theorem G.1

Let $X,Y\in\mathbb{R}^{n\times d}$ and define the centered Gram matrices with $H = I-\tfrac{1}{n}\mathbf{1}\mathbf{1}^\top$. Normalize $\widehat{K} \coloneqq K/\|K\|_F$, $\widehat{L} \coloneqq L/\|L\|_F$, and define the (linear) CKA For distinct indices $(i,j,k)$, define the squared-Euclidean triplet mask $T_{i;jk}\in\mathbb{S}^n$ by and $0$ elsewhere. Let $\mathcal{S}_c \coloneqq \{ M \in \m

Figures (16)

  • Figure 1: ScoreMix. Adding carefully generated synthetic augmentations to the original training set boosts the discriminator’s performance, without relying on other sources of information (right). The first two subplots on the left show diffusion trajectories obtained under two different conditioning signals (Cond A/B). Using convex combinations of their score functions (ScoreMix A,B), we generate synthetic samples that interpolate between the two trajectories.
  • Figure 2: Effect of mixing scores in ScoreMix. Each cell shows the image produced for one pair of inputs while sweeping $\alpha$ (horizontal, left$\rightarrow$right) and $\beta$ (vertical, top$\rightarrow$bottom). Randomness is fixed across images.
  • Figure 3: Qualitative comparison of ScoreMix augmentation. Rows show Orig ID1, Repro ID1, ScoreMix (Eq. \ref{['eq:score_mix']}, AutoGuidance=1.3), Repro ID2, and Orig ID2. The center column provides augmented samples whose subtle deviations from original ones improve discriminator performance.
  • Figure 4: Geometry preservation of various spaces measured using CKA during the training of the generator.
  • Figure 5: Alignment loss to class-centers before and after applying alignment regularization during the training of the generator.
  • ...and 11 more figures

Theorems & Definitions (10)

  • Conjecture 4.2
  • Theorem G.1: CKA and local-order preservation under $\widehat{K}$-orthogonal, energy-matched Gaussian misalignment
  • proof
  • Corollary G.2: Unnormalized form
  • Corollary G.3: Universal lower bound
  • Remark 1: On centering and the choice of $T_c$
  • Remark 2: Alternative residual model
  • Corollary G.4: Cosine similarity case: exact bound and universal lower bound
  • proof
  • Corollary G.5: Kernel-induced triplet margins