Data-free Knowledge Distillation with Diffusion Models

Xiaohua Qi; Renda Li; Long Peng; Qiang Ling; Jun Yu; Ziyi Chen; Peng Chang; Mei Han; Jing Xiao

Data-free Knowledge Distillation with Diffusion Models

Xiaohua Qi, Renda Li, Long Peng, Qiang Ling, Jun Yu, Ziyi Chen, Peng Chang, Mei Han, Jing Xiao

TL;DR

The paper tackles data-free knowledge distillation (DFKD) by generating synthetic data with pre-trained diffusion models guided by teacher-model information. It introduces an inversion loss that combines Batch Normalization regularization, class priors, and adversarial distillation, formalized as $L_{inv} = \alpha L_{bn} + \beta L_{cls} + \gamma L_{adv}$, and updates latent representations during diffusion with a single gradient step per timestep. To boost data diversity efficiently, it introduces Latent CutMix Augmentation, applying CutMix in the latent space every $k$ steps and using inpainting to repair artifacts. The KD objective on the generated data fuses CAM-consistency (mSARC) with KL-divergence between teacher and student outputs, yielding state-of-the-art results on CIFAR-10/100, Tiny-ImageNet, and DomainNet, with code released at the project repository.

Abstract

Recently Data-Free Knowledge Distillation (DFKD) has garnered attention and can transfer knowledge from a teacher neural network to a student neural network without requiring any access to training data. Although diffusion models are adept at synthesizing high-fidelity photorealistic images across various domains, existing methods cannot be easiliy implemented to DFKD. To bridge that gap, this paper proposes a novel approach based on diffusion models, DiffDFKD. Specifically, DiffDFKD involves targeted optimizations in two key areas. Firstly, DiffDFKD utilizes valuable information from teacher models to guide the pre-trained diffusion models' data synthesis, generating datasets that mirror the training data distribution and effectively bridge domain gaps. Secondly, to reduce computational burdens, DiffDFKD introduces Latent CutMix Augmentation, an efficient technique, to enhance the diversity of diffusion model-generated images for DFKD while preserving key attributes for effective knowledge transfer. Extensive experiments validate the efficacy of DiffDFKD, yielding state-of-the-art results exceeding existing DFKD approaches. We release our code at https://github.com/xhqi0109/DiffDFKD.

Data-free Knowledge Distillation with Diffusion Models

TL;DR

Abstract

Data-free Knowledge Distillation with Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)