CONCORD: Concept-Informed Diffusion for Dataset Distillation
Jianyang Gu, Haonan Wang, Ruoxi Jia, Saeed Vahidian, Vyacheslav Kungurtsev, Wei Jiang, Yiran Chen
TL;DR
Concord introduces concept-informed diffusion to dataset distillation by leveraging LLM-derived, fine-grained category concepts to guide the diffusion denoising process, addressing instance-level concept completeness that prior methods overlook. By acquiring and validating concepts with CLIP and applying a contrastive matching objective that includes negative concepts, Concord provides explicit, interpretable control over generated samples without requiring pre-trained classifiers. Empirical results on ImageNet-1K, its subsets, and Food-101 show state-of-the-art surrogate data quality and improved downstream performance across multiple baselines and IPC settings. The method offers practical benefits for researchers with limited resources and enhances the reliability of distilled datasets for training robust models. Limitations include additional computational cost and potential challenges for few-step diffusion, suggesting directions for more efficient inference in future work.
Abstract
Dataset distillation (DD) has witnessed significant progress in creating small datasets that encapsulate rich information from large original ones. Particularly, methods based on generative priors show promising performance, while maintaining computational efficiency and cross-architecture generalization. However, the generation process lacks explicit controllability for each sample. Previous distillation methods primarily match the real distribution from the perspective of the entire dataset, whereas overlooking concept completeness at the instance level. The missing or incorrectly represented object details cannot be efficiently compensated due to the constrained sample amount typical in DD settings. To this end, we propose incorporating the concept understanding of large language models (LLMs) to perform Concept-Informed Diffusion (CONCORD) for dataset distillation. Specifically, distinguishable and fine-grained concepts are retrieved based on category labels to inform the denoising process and refine essential object details. By integrating these concepts, the proposed method significantly enhances both the controllability and interpretability of the distilled image generation, without relying on pre-trained classifiers. We demonstrate the efficacy of CONCORD by achieving state-of-the-art performance on ImageNet-1K and its subsets. The code implementation is released in https://github.com/vimar-gu/CONCORD.
