Table of Contents
Fetching ...

Increasing the Utility of Synthetic Images through Chamfer Guidance

Nicola Dall'Asen, Xiaofeng Zhang, Reyhane Askari Hemmat, Melissa Hall, Jakob Verbeek, Adriana Romero-Soriano, Michal Drozdzal

TL;DR

This work introduces Chamfer Guidance, a training-free method that uses a small set of real exemplars to ground the diversity and quality of synthetic images generated by conditional diffusion systems. By formulating image distribution matching as a Chamfer distance between real and generated sets in a semantic feature space (e.g., DINOv2), and applying inference-time guidance, the approach achieves superior fidelity and grounded diversity without CFG and with substantial compute savings. Across ImageNet-1k object-centric and GeoDE/DollarStreet geographic benchmarks, Chamfer Guidance yields state-of-the-art or near-state-of-the-art results, scales effectively with more exemplars, and improves downstream classifier accuracy when trained on synthetic data. The method also demonstrates favorable OOD generalization and reduces memory/compute overhead relative to prior context-guided methods, making synthetic data more practically useful for training and evaluation.

Abstract

Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress in generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4% in terms of precision, and 86.4% in terms of distributional coverage, which increase to 97.5% and 92.7%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15% for in-distribution over the baselines, and up to 16% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.

Increasing the Utility of Synthetic Images through Chamfer Guidance

TL;DR

This work introduces Chamfer Guidance, a training-free method that uses a small set of real exemplars to ground the diversity and quality of synthetic images generated by conditional diffusion systems. By formulating image distribution matching as a Chamfer distance between real and generated sets in a semantic feature space (e.g., DINOv2), and applying inference-time guidance, the approach achieves superior fidelity and grounded diversity without CFG and with substantial compute savings. Across ImageNet-1k object-centric and GeoDE/DollarStreet geographic benchmarks, Chamfer Guidance yields state-of-the-art or near-state-of-the-art results, scales effectively with more exemplars, and improves downstream classifier accuracy when trained on synthetic data. The method also demonstrates favorable OOD generalization and reduces memory/compute overhead relative to prior context-guided methods, making synthetic data more practically useful for training and evaluation.

Abstract

Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress in generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4% in terms of precision, and 86.4% in terms of distributional coverage, which increase to 97.5% and 92.7%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15% for in-distribution over the baselines, and up to 16% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.

Paper Structure

This paper contains 31 sections, 8 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Our Chamfer Guidance addresses key limitations of existing image generation approaches, producing high-quality and diverse outputs. Base models (here LDM3.5M) necessitate high CFG scales to achieve prompt adherence and quality, at the expense of diversity. Reference-free methods can introduce ungrounded diversity, failing to capture the underlying data distribution. While training-based solutions effectively narrow the fidelity gap with the reference distribution, they suffer from low subject diversity, particularly in backgrounds. Our Chamfer Guidance achieves superior image quality without using CFG, substantially improving grounded coverage (C) and aligning the generated images more precisely (P) with the reference distribution. Best viewed zoomed in.
  • Figure 2: Effect of the number of real reference samples $k$ on LDM1.5 and LDM3.5M for ImageNet-1k. We can see that only our Chamfer Guidance can effectively leverage the the increased number of reference images, consistently obtaining favorable trends across Coverage, Precision, and FDD.
  • Figure 3: Example of user study question.
  • Figure 4: LDM1.5 generations on ImageNet-1k with different $\omega$ values. $k=32$, $\gamma=0.07$ for our Chamfer guidance. The classes are from top to bottom: container ship, pelican, brambling, and dutch oven.
  • Figure 5: LDM3.5M on ImageNet-1k with different $\omega$ and $gamma$ values, $k=32$. The classes are from top to bottom: Irish wolfhound and hamster
  • ...and 2 more figures