Increasing the Utility of Synthetic Images through Chamfer Guidance
Nicola Dall'Asen, Xiaofeng Zhang, Reyhane Askari Hemmat, Melissa Hall, Jakob Verbeek, Adriana Romero-Soriano, Michal Drozdzal
TL;DR
This work introduces Chamfer Guidance, a training-free method that uses a small set of real exemplars to ground the diversity and quality of synthetic images generated by conditional diffusion systems. By formulating image distribution matching as a Chamfer distance between real and generated sets in a semantic feature space (e.g., DINOv2), and applying inference-time guidance, the approach achieves superior fidelity and grounded diversity without CFG and with substantial compute savings. Across ImageNet-1k object-centric and GeoDE/DollarStreet geographic benchmarks, Chamfer Guidance yields state-of-the-art or near-state-of-the-art results, scales effectively with more exemplars, and improves downstream classifier accuracy when trained on synthetic data. The method also demonstrates favorable OOD generalization and reduces memory/compute overhead relative to prior context-guided methods, making synthetic data more practically useful for training and evaluation.
Abstract
Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress in generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4% in terms of precision, and 86.4% in terms of distributional coverage, which increase to 97.5% and 92.7%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15% for in-distribution over the baselines, and up to 16% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.
