Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Linfeng Zhang; Kaisheng Ma

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Linfeng Zhang, Kaisheng Ma

TL;DR

One-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps, is introduced.

Abstract

Significant advancements in image generation have been made with diffusion models. Nevertheless, when contrasted with previous generative models, diffusion models face substantial computational overhead, leading to failure in real-time generation. Recent approaches have aimed to accelerate diffusion models by reducing the number of sampling steps through improved sampling techniques or step distillation. However, the methods to diminish the computational cost for each timestep remain a relatively unexplored area. Observing the fact that diffusion models exhibit varying input distributions and feature distributions at different timesteps, we introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration. Codes will be released in Github.

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 11 figures, 7 tables)

This paper contains 22 sections, 5 equations, 11 figures, 7 tables.

Introduction
Related Work
Diffusion Models
Knowledge Distillation
Methodology
Preliminary
Knowledge Distillation
Traditional one-to-one knowledge distillation
One-to-Many Knowledge Distillation (O2MKD)
Experiment
Experimental Setting
Experimental Results
Quantitative Results
Discussion
Memory Footprint Analysis
...and 7 more sections

Figures (11)

Figure 1: Feature visualization of pre-trained diffusion models on CIFAR10 ($T$=100). (a) Visualization of feature distribution at timesteps of 0 and 100. (b) The box plot of feature distribution at all the steps.
Figure 2: Comparison between traditional one-to-one knowledge distillation and the proposed one-to-many knowledge distillation with three students ($N=3$) in their training period. $T$ indicates the largest timesteps.
Figure 3: Comparison between traditional one-to-one knowledge distillation and our O2MKD with three students ($N=3$) in the sampling period. $t$ indicates the timestep.
Figure 4: Qualitative comparison between the students trained with and without our method.
Figure 5: O2MKD with different DDIM sampling steps.
...and 6 more figures

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

TL;DR

Abstract

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)