DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Qianlong Xiang; Miao Zhang; Yuzhang Shang; Jianlong Wu; Yan Yan; Liqiang Nie

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie

TL;DR

This work tackles the escalating data requirements of diffusion models by introducing Data-Free Knowledge Distillation for Diffusion Models (DKDM), which trains new DMs with any architecture using existing pretrained DMs as the data source. It introduces a DKDM objective that eliminates dependence on inaccessible data and the diffusion posterior, and a dynamic iterative distillation framework that efficiently harvests time-domain knowledge from teachers. Empirical results across pixel and latent spaces show that DKDM produces competitive or superior generative performance compared to data-based training, and even enables cross-architecture distillation between CNNs and ViTs. The proposed approach significantly reduces data acquisition and storage burdens while maintaining or improving quality, offering a practical pathway for data-free diffusion-model development.

Abstract

Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains, including image, video, and so on. A key factor contributing to their effectiveness is the high quantity and quality of data used during training. However, mainstream DMs now consume increasingly large amounts of data. For example, training a Stable Diffusion model requires billions of image-text pairs. This enormous data requirement poses significant challenges for training large DMs due to high data acquisition costs and storage expenses. To alleviate this data burden, we propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture. We refer to this scenario as Data-Free Knowledge Distillation for Diffusion Models (DKDM), where the generative ability of DMs is transferred to new ones in a data-free manner. To tackle this challenge, we make two main contributions. First, we introduce a DKDM objective that enables the training of new DMs via distillation, without requiring access to the data. Second, we develop a dynamic iterative distillation method that efficiently extracts time-domain knowledge from existing DMs, enabling direct retrieval of training data without the need for a prolonged generative process. To the best of our knowledge, we are the first to explore this scenario. Experimental results demonstrate that our data-free approach not only achieves competitive generative performance but also, in some instances, outperforms models trained with the entire dataset.

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

TL;DR

Abstract

Paper Structure (12 sections, 20 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 12 sections, 20 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Preliminaries on Diffusion Models
Data-Free Knowledge Distillation for Diffusion Models
DKDM Paradigm
DKDM Objective
Efficient Collection of Knowledge
Experiments
Experiment Setting
Main Results
Ablation Study
Related Work
Conclusion

Figures (6)

Figure 1: Illustration of our DKDM concept: utilizing pretrained diffusion models to train new ones, thus avoiding the high costs associated with increasingly large datasets.
Figure 2: Illustration of our DKDM Paradigm. (a): standard data-based training of DMs. (b): a straightforward data-free training approach. (c): our proposed framework for DKDM.
Figure 3: Dynamic Iterative Distillation: An enlarged batch set is initially constructed by sampling from a Gaussian distribution. Next, shuffle denoise is applied, wherein each sample is denoised random times. A batch is then randomly selected from this enlarged set for training the student with the denoised results substituting for their counterparts in the batch set. This process is repeated iteratively.
Figure 4: FID scores of analytical experiments on CIFAR10. (a): Ablation on dynamic iterative distillation with $\rho=0.4$. (b): Effect of different $\rho$.
Figure 5: Selected samples generated by our student models across five datasets.
...and 1 more figures

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

TL;DR

Abstract

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (6)