Table of Contents
Fetching ...

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Linfeng Zhang, Kaisheng Ma

TL;DR

One-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps, is introduced.

Abstract

Significant advancements in image generation have been made with diffusion models. Nevertheless, when contrasted with previous generative models, diffusion models face substantial computational overhead, leading to failure in real-time generation. Recent approaches have aimed to accelerate diffusion models by reducing the number of sampling steps through improved sampling techniques or step distillation. However, the methods to diminish the computational cost for each timestep remain a relatively unexplored area. Observing the fact that diffusion models exhibit varying input distributions and feature distributions at different timesteps, we introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration. Codes will be released in Github.

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

TL;DR

One-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps, is introduced.

Abstract

Significant advancements in image generation have been made with diffusion models. Nevertheless, when contrasted with previous generative models, diffusion models face substantial computational overhead, leading to failure in real-time generation. Recent approaches have aimed to accelerate diffusion models by reducing the number of sampling steps through improved sampling techniques or step distillation. However, the methods to diminish the computational cost for each timestep remain a relatively unexplored area. Observing the fact that diffusion models exhibit varying input distributions and feature distributions at different timesteps, we introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models, where each student diffusion model is trained to learn the teacher's knowledge for a subset of continuous timesteps. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration. Codes will be released in Github.
Paper Structure (22 sections, 5 equations, 11 figures, 7 tables)

This paper contains 22 sections, 5 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Feature visualization of pre-trained diffusion models on CIFAR10 ($T$=100). (a) Visualization of feature distribution at timesteps of 0 and 100. (b) The box plot of feature distribution at all the steps.
  • Figure 2: Comparison between traditional one-to-one knowledge distillation and the proposed one-to-many knowledge distillation with three students ($N=3$) in their training period. $T$ indicates the largest timesteps.
  • Figure 3: Comparison between traditional one-to-one knowledge distillation and our O2MKD with three students ($N=3$) in the sampling period. $t$ indicates the timestep.
  • Figure 4: Qualitative comparison between the students trained with and without our method.
  • Figure 5: O2MKD with different DDIM sampling steps.
  • ...and 6 more figures