Table of Contents
Fetching ...

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Jeffrey A. Chan-Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah

TL;DR

This work tackles dataset distillation by extracting diverse, representative samples from a fixed, pre-trained diffusion model without distillation-loss fine-tuning. It introduces Mode Discovery to identify data modes, Mode Guidance to steer sampling toward each mode during the diffusion denoising process, and Stop Guidance to preserve sample quality while maintaining diversity. The approach achieves state-of-the-art accuracy across ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K under hard- and soft-label protocols, while significantly reducing computational cost compared with fine-tuning-based methods. It further demonstrates compatibility with multiple diffusion backbones, including text-to-image models, broadening practical deployments for efficient dataset distillation in resource-constrained settings.

Abstract

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance. We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance. Our approach outperforms state-of-the-art methods, achieving accuracy gains of 4.4%, 2.9%, 1.6%, and 1.6% on ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K, respectively. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs. Our code is available on the project webpage: https://jachansantiago.github.io/mode-guided-distillation/

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

TL;DR

This work tackles dataset distillation by extracting diverse, representative samples from a fixed, pre-trained diffusion model without distillation-loss fine-tuning. It introduces Mode Discovery to identify data modes, Mode Guidance to steer sampling toward each mode during the diffusion denoising process, and Stop Guidance to preserve sample quality while maintaining diversity. The approach achieves state-of-the-art accuracy across ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K under hard- and soft-label protocols, while significantly reducing computational cost compared with fine-tuning-based methods. It further demonstrates compatibility with multiple diffusion backbones, including text-to-image models, broadening practical deployments for efficient dataset distillation in resource-constrained settings.

Abstract

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance. We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance. Our approach outperforms state-of-the-art methods, achieving accuracy gains of 4.4%, 2.9%, 1.6%, and 1.6% on ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K, respectively. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs. Our code is available on the project webpage: https://jachansantiago.github.io/mode-guided-distillation/

Paper Structure

This paper contains 21 sections, 7 equations, 11 figures, 10 tables, 1 algorithm.

Figures (11)

  • Figure 1: Optimization-based Dataset Distillation: Optimizes the distilled dataset to match the statistics of gradient/features of the Original Dataset. Generative Dataset Distillation: First, it learns the dataset distribution of the original dataset and then sample a dataset that approximates the original dataset distribution.
  • Figure 2: Overview of the gradient field (score function) during the denoising process in latent diffusion for a specific class $c$. The original data distribution, marked by blue dots, shows denser regions (orange shadow) in the gradient field. To generate an image $\hat{X}_i$, noise ${x_T}^i \sim N(0, \mathbf{I})$ is sampled. In (a), a pre-trained diffusion model demonstrates imbalanced mode likelihood, leading to limited sample diversity and repeated modes. (b) shows MinMax Diffusion, which fine-tunes the model to enhance diversity by balancing mode likelihoods, but still faces redundancies based on initial noise conditions. (c), the proposed method introduces mode guidance in the denoising process (green and red traces), directing samples towards distinct modes (stars). After $k$ steps of guidance, it transitions to unguided denoising (black trace), achieving high diversity and consistency without the need for fine-tuning.
  • Figure 3: Overview of the proposed method for distilled dataset synthesis using a diffusion model. Our approach consists of three key stages: Mode Discovery, Mode Guidance, and Stop Guidance. (Left) In the Mode Discovery stage, we estimate the $N$ modes of the original dataset within the generative space of the latent diffusion model. (Right) Given a mode $m_{target}$ and a class $c$, the Mode-Guided Diffusion process directs the generation toward the specified mode $m_{target}$. This guidance is applied for $t_{stop}$ steps until the Stop Guidance stage, after which unguided diffusion takes over. During sampling, mode guidance ensures that images from the desired mode $m_k$ are generated using the pre-trained diffusion model. If no guidance is applied, the generation follows the unguided (grey) path, which can lead to redundancies in the dataset.
  • Figure 4: Evaluation results across multiple datasets. (a–c) Accuracy of the Text-to-Image model using the soft-label protocol: (a) Nette dataset, (b) IDC dataset, and (c) ImageNet-1K dataset. (d) ImageNet-1K classification accuracy of the DiT + MGD$^3$ model compared to other state-of-the-art (SOTA) methods. All reported values are the mean accuracy over three runs.
  • Figure 5: T-SNE plot showing the original samples (●) and the synthetic samples generated by different diffusion-based methods for two classes (English springer and cassette player) from ImageNet-1k. This visualization shows that DiT peebles2023scalable has limited diversity, Minmax gu2024efficient diffusion shows diversity but lacks full coverage, while our approach demonstrates mode diversity, achieving higher coverage.
  • ...and 6 more figures