ScoreMix: Synthetic Data Generation by Score Composition in Diffusion Models Improves Recognition
Parsa Rahimi, Sebastien Marcel
TL;DR
ScoreMix introduces a self-contained augmentation strategy for discriminative face recognition that relies solely on the target dataset. It exploits score composition in diffusion models by convexly mixing class-conditioned scores during reverse diffusion, generating hard, on-manifold synthetic samples without external data. Across eight benchmarks, ScoreMix yields up to ~7 percentage points accuracy gains, with robustness to backbone changes and no hyperparameter search, and it provides theoretical insight into the geometry of class-space relationships via alignment metrics like CKA/CKNNA. While offering practical improvements, the approach incurs higher sampling cost and highlights a need for further work on aligning generative and discriminative spaces without sacrificing diversity.
Abstract
Synthetic data generation is increasingly used in machine learning for training and data augmentation. Yet, current strategies often rely on external foundation models or datasets, whose usage is restricted in many scenarios due to policy or legal constraints. We propose ScoreMix, a self-contained synthetic generation method to produce hard synthetic samples for recognition tasks by leveraging the score compositionality of diffusion models. The approach mixes class-conditioned scores along reverse diffusion trajectories, yielding domain-specific data augmentation without external resources. We systematically study class-selection strategies and find that mixing classes distant in the discriminator's embedding space yields larger gains, providing up to 3% additional average improvement, compared to selection based on proximity. Interestingly, we observe that condition and embedding spaces are largely uncorrelated under standard alignment metrics, and the generator's condition space has a negligible effect on downstream performance. Across 8 public face recognition benchmarks, ScoreMix improves accuracy by up to 7 percentage points, without hyperparameter search, highlighting both robustness and practicality. Our method provides a simple yet effective way to maximize discriminator performance using only the available dataset, without reliance on third-party resources. Paper website: https://parsa-ra.github.io/scoremix/.
