Table of Contents
Fetching ...

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie

TL;DR

This work tackles the escalating data requirements of diffusion models by introducing Data-Free Knowledge Distillation for Diffusion Models (DKDM), which trains new DMs with any architecture using existing pretrained DMs as the data source. It introduces a DKDM objective that eliminates dependence on inaccessible data and the diffusion posterior, and a dynamic iterative distillation framework that efficiently harvests time-domain knowledge from teachers. Empirical results across pixel and latent spaces show that DKDM produces competitive or superior generative performance compared to data-based training, and even enables cross-architecture distillation between CNNs and ViTs. The proposed approach significantly reduces data acquisition and storage burdens while maintaining or improving quality, offering a practical pathway for data-free diffusion-model development.

Abstract

Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains, including image, video, and so on. A key factor contributing to their effectiveness is the high quantity and quality of data used during training. However, mainstream DMs now consume increasingly large amounts of data. For example, training a Stable Diffusion model requires billions of image-text pairs. This enormous data requirement poses significant challenges for training large DMs due to high data acquisition costs and storage expenses. To alleviate this data burden, we propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture. We refer to this scenario as Data-Free Knowledge Distillation for Diffusion Models (DKDM), where the generative ability of DMs is transferred to new ones in a data-free manner. To tackle this challenge, we make two main contributions. First, we introduce a DKDM objective that enables the training of new DMs via distillation, without requiring access to the data. Second, we develop a dynamic iterative distillation method that efficiently extracts time-domain knowledge from existing DMs, enabling direct retrieval of training data without the need for a prolonged generative process. To the best of our knowledge, we are the first to explore this scenario. Experimental results demonstrate that our data-free approach not only achieves competitive generative performance but also, in some instances, outperforms models trained with the entire dataset.

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

TL;DR

This work tackles the escalating data requirements of diffusion models by introducing Data-Free Knowledge Distillation for Diffusion Models (DKDM), which trains new DMs with any architecture using existing pretrained DMs as the data source. It introduces a DKDM objective that eliminates dependence on inaccessible data and the diffusion posterior, and a dynamic iterative distillation framework that efficiently harvests time-domain knowledge from teachers. Empirical results across pixel and latent spaces show that DKDM produces competitive or superior generative performance compared to data-based training, and even enables cross-architecture distillation between CNNs and ViTs. The proposed approach significantly reduces data acquisition and storage burdens while maintaining or improving quality, offering a practical pathway for data-free diffusion-model development.

Abstract

Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains, including image, video, and so on. A key factor contributing to their effectiveness is the high quantity and quality of data used during training. However, mainstream DMs now consume increasingly large amounts of data. For example, training a Stable Diffusion model requires billions of image-text pairs. This enormous data requirement poses significant challenges for training large DMs due to high data acquisition costs and storage expenses. To alleviate this data burden, we propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture. We refer to this scenario as Data-Free Knowledge Distillation for Diffusion Models (DKDM), where the generative ability of DMs is transferred to new ones in a data-free manner. To tackle this challenge, we make two main contributions. First, we introduce a DKDM objective that enables the training of new DMs via distillation, without requiring access to the data. Second, we develop a dynamic iterative distillation method that efficiently extracts time-domain knowledge from existing DMs, enabling direct retrieval of training data without the need for a prolonged generative process. To the best of our knowledge, we are the first to explore this scenario. Experimental results demonstrate that our data-free approach not only achieves competitive generative performance but also, in some instances, outperforms models trained with the entire dataset.
Paper Structure (12 sections, 20 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 12 sections, 20 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of our DKDM concept: utilizing pretrained diffusion models to train new ones, thus avoiding the high costs associated with increasingly large datasets.
  • Figure 2: Illustration of our DKDM Paradigm. (a): standard data-based training of DMs. (b): a straightforward data-free training approach. (c): our proposed framework for DKDM.
  • Figure 3: Dynamic Iterative Distillation: An enlarged batch set is initially constructed by sampling from a Gaussian distribution. Next, shuffle denoise is applied, wherein each sample is denoised random times. A batch is then randomly selected from this enlarged set for training the student with the denoised results substituting for their counterparts in the batch set. This process is repeated iteratively.
  • Figure 4: FID scores of analytical experiments on CIFAR10. (a): Ablation on dynamic iterative distillation with $\rho=0.4$. (b): Effect of different $\rho$.
  • Figure 5: Selected samples generated by our student models across five datasets.
  • ...and 1 more figures