Table of Contents
Fetching ...

Data-free Knowledge Distillation with Diffusion Models

Xiaohua Qi, Renda Li, Long Peng, Qiang Ling, Jun Yu, Ziyi Chen, Peng Chang, Mei Han, Jing Xiao

TL;DR

The paper tackles data-free knowledge distillation (DFKD) by generating synthetic data with pre-trained diffusion models guided by teacher-model information. It introduces an inversion loss that combines Batch Normalization regularization, class priors, and adversarial distillation, formalized as $L_{inv} = \alpha L_{bn} + \beta L_{cls} + \gamma L_{adv}$, and updates latent representations during diffusion with a single gradient step per timestep. To boost data diversity efficiently, it introduces Latent CutMix Augmentation, applying CutMix in the latent space every $k$ steps and using inpainting to repair artifacts. The KD objective on the generated data fuses CAM-consistency (mSARC) with KL-divergence between teacher and student outputs, yielding state-of-the-art results on CIFAR-10/100, Tiny-ImageNet, and DomainNet, with code released at the project repository.

Abstract

Recently Data-Free Knowledge Distillation (DFKD) has garnered attention and can transfer knowledge from a teacher neural network to a student neural network without requiring any access to training data. Although diffusion models are adept at synthesizing high-fidelity photorealistic images across various domains, existing methods cannot be easiliy implemented to DFKD. To bridge that gap, this paper proposes a novel approach based on diffusion models, DiffDFKD. Specifically, DiffDFKD involves targeted optimizations in two key areas. Firstly, DiffDFKD utilizes valuable information from teacher models to guide the pre-trained diffusion models' data synthesis, generating datasets that mirror the training data distribution and effectively bridge domain gaps. Secondly, to reduce computational burdens, DiffDFKD introduces Latent CutMix Augmentation, an efficient technique, to enhance the diversity of diffusion model-generated images for DFKD while preserving key attributes for effective knowledge transfer. Extensive experiments validate the efficacy of DiffDFKD, yielding state-of-the-art results exceeding existing DFKD approaches. We release our code at https://github.com/xhqi0109/DiffDFKD.

Data-free Knowledge Distillation with Diffusion Models

TL;DR

The paper tackles data-free knowledge distillation (DFKD) by generating synthetic data with pre-trained diffusion models guided by teacher-model information. It introduces an inversion loss that combines Batch Normalization regularization, class priors, and adversarial distillation, formalized as , and updates latent representations during diffusion with a single gradient step per timestep. To boost data diversity efficiently, it introduces Latent CutMix Augmentation, applying CutMix in the latent space every steps and using inpainting to repair artifacts. The KD objective on the generated data fuses CAM-consistency (mSARC) with KL-divergence between teacher and student outputs, yielding state-of-the-art results on CIFAR-10/100, Tiny-ImageNet, and DomainNet, with code released at the project repository.

Abstract

Recently Data-Free Knowledge Distillation (DFKD) has garnered attention and can transfer knowledge from a teacher neural network to a student neural network without requiring any access to training data. Although diffusion models are adept at synthesizing high-fidelity photorealistic images across various domains, existing methods cannot be easiliy implemented to DFKD. To bridge that gap, this paper proposes a novel approach based on diffusion models, DiffDFKD. Specifically, DiffDFKD involves targeted optimizations in two key areas. Firstly, DiffDFKD utilizes valuable information from teacher models to guide the pre-trained diffusion models' data synthesis, generating datasets that mirror the training data distribution and effectively bridge domain gaps. Secondly, to reduce computational burdens, DiffDFKD introduces Latent CutMix Augmentation, an efficient technique, to enhance the diversity of diffusion model-generated images for DFKD while preserving key attributes for effective knowledge transfer. Extensive experiments validate the efficacy of DiffDFKD, yielding state-of-the-art results exceeding existing DFKD approaches. We release our code at https://github.com/xhqi0109/DiffDFKD.

Paper Structure

This paper contains 23 sections, 13 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the proposed DiffDFKD framework. At each step $t$: (1) the latent variable $\mathbf{z}_t$ is processed through a pre-trained diffusion model to obtain a prediction $\hat{\mathbf{z}}_{0,t}$, which is used to compute the loss; (2) the combined loss is backpropagated to update $\mathbf{z}_t$; (3) the updated $\mathbf{z}_t$ undergoes CutMix augmentation and denoising to produce $\mathbf{z}_{t-1}$ for the next iteration. After $T$ steps, a synthetic dataset is generated, facilitating the knowledge distillation process. Note that in the figure, the Image Generation and Knowledge Distillation stages utilize the same teacher-student model pair.
  • Figure 2: Inverted data from a pre-trained ResNet-34 on CIFAR-10 with a student model of ResNet-18.