CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
Letian Zhou, Songhua Liu, Xinchao Wang
TL;DR
CoDA presents a training-free dataset distillation framework that replaces target-dataset pretraining of diffusion models with a two-stage pipeline: Distribution Discovery uses a density-based, per-class clustering approach (UMAP+HDBSCAN) to identify an intrinsic core distribution Sr; Distribution Alignment then steers a off-the-shelf text-to-image diffusion model to generate IPC-sized representatives aligned with Sr by modifying the denoising process via a learned guidance term. The approach achieves state-of-the-art results across seven ImageNet subsets, ImageNet-1K at 60.4% IPC, and strong cross-architecture performance on ImageNet-A–E, while enabling zero-shot domain transfer and robust performance under noisy or limited labels. These results highlight the practical impact of bridging general priors with target semantics without expensive target training, offering scalable, domain-adaptive dataset distillation. CoDA’s versatility across encoders and diffusion architectures further emphasizes its potential to redefine training data preparation in large-scale vision tasks.
Abstract
Prevailing Dataset Distillation (DD) methods leveraging generative models confront two fundamental limitations. First, despite pioneering the use of diffusion models in DD and delivering impressive performance, the vast majority of approaches paradoxically require a diffusion model pre-trained on the full target dataset, undermining the very purpose of DD and incurring prohibitive training costs. Second, although some methods turn to general text-to-image models without relying on such target-specific training, they suffer from a significant distributional mismatch, as the web-scale priors encapsulated in these foundation models fail to faithfully capture the target-specific semantics, leading to suboptimal performance. To tackle these challenges, we propose Core Distribution Alignment (CoDA), a framework that enables effective DD using only an off-the-shelf text-to-image model. Our key idea is to first identify the "intrinsic core distribution" of the target dataset using a robust density-based discovery mechanism. We then steer the generative process to align the generated samples with this core distribution. By doing so, CoDA effectively bridges the gap between general-purpose generative priors and target semantics, yielding highly representative distilled datasets. Extensive experiments suggest that, without relying on a generative model specifically trained on the target dataset, CoDA achieves performance on par with or even superior to previous methods with such reliance across all benchmarks, including ImageNet-1K and its subsets. Notably, it establishes a new state-of-the-art accuracy of 60.4% at the 50-images-per-class (IPC) setup on ImageNet-1K. Our code is available on the project webpage: https://github.com/zzzlt422/CoDA
