Table of Contents
Fetching ...

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, Bowen Tang

TL;DR

D$^4$M introduces a cross-architecture, diffusion-model–driven dataset distillation framework that replaces synthesis-time matching with Training-Time Matching and prototype-guided latent diffusion. By extracting category prototypes via clustering in a latent space and conditioning a Latent Diffusion Model on these prototypes and text prompts, D$^4$M generates high-resolution, realistic synthetic data without architecture-specific matching. Soft-label Training-Time Matching further enhances generalization across architectures, enabling scalable distillation on ImageNet-1K and other large-scale datasets with improved efficiency. The approach delivers state-of-the-art or competitive results across benchmarks, while reducing computational costs and enabling stable cross-architecture transfer for distilled datasets.

Abstract

Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performance decline on cross-architectures. We advocate for designing an economical dataset distillation framework that is independent of the matching architectures. With empirical observations, we argue that constraining the consistency of the real and synthetic image spaces will enhance the cross-architecture generalization. Motivated by this, we introduce Dataset Distillation via Disentangled Diffusion Model (D$^4$M), an efficient framework for dataset distillation. Compared to architecture-dependent methods, D$^4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. The distilled datasets are versatile, eliminating the need for repeated generation of distinct datasets for various architectures. Through comprehensive experiments, D$^4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.

D$^4$M: Dataset Distillation via Disentangled Diffusion Model

TL;DR

DM introduces a cross-architecture, diffusion-model–driven dataset distillation framework that replaces synthesis-time matching with Training-Time Matching and prototype-guided latent diffusion. By extracting category prototypes via clustering in a latent space and conditioning a Latent Diffusion Model on these prototypes and text prompts, DM generates high-resolution, realistic synthetic data without architecture-specific matching. Soft-label Training-Time Matching further enhances generalization across architectures, enabling scalable distillation on ImageNet-1K and other large-scale datasets with improved efficiency. The approach delivers state-of-the-art or competitive results across benchmarks, while reducing computational costs and enabling stable cross-architecture transfer for distilled datasets.

Abstract

Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performance decline on cross-architectures. We advocate for designing an economical dataset distillation framework that is independent of the matching architectures. With empirical observations, we argue that constraining the consistency of the real and synthetic image spaces will enhance the cross-architecture generalization. Motivated by this, we introduce Dataset Distillation via Disentangled Diffusion Model (DM), an efficient framework for dataset distillation. Compared to architecture-dependent methods, DM employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. The distilled datasets are versatile, eliminating the need for repeated generation of distinct datasets for various architectures. Through comprehensive experiments, DM demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.
Paper Structure (23 sections, 8 equations, 22 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 8 equations, 22 figures, 9 tables, 1 algorithm.

Figures (22)

  • Figure 1: Comparison of various matching strategies in dataset distillation. (a) The bi-level optimization implements data matching at synthesis time. (b) Dual-Time Matching strategy decouples the bi-level optimization process into synthesis time and training time to save computational overhead. (c) D$^4$M utilizes multi-modal features (image and texts) to synthesize high-quality images. D$^4$M does not require matching process at Synthesis-Time.
  • Figure 2: Visualizations of previous DD methods. Synthesis-Time Matching sacrifices part of the visual semantic expression in order to imitate the performance of the original dataset.
  • Figure 3: Pipeline of Dataset Distillation via Disentangled Diffusion Model (D$^4$M). Rather than using the embedded features directly, D$^4$M disentangles feature extraction from image generation in diffusion models through prototype learning.
  • Figure 4: Visualization results. The top row of each dataset comes from D$^4$M and the bottom comes from SRe$^2$L yin2023squeeze (ImageNet-1K and Tiny-ImageNet) and MTT cazenavette2022dataset (CIFAR-10/100). The images generated by D$^4$M have better resolution and are more lifelike.
  • Figure 5: Visualization results within one category. D$^4$M (top) provides richer semantic information than SRe$^2$L.
  • ...and 17 more figures