Table of Contents
Fetching ...

SFDDM: Single-fold Distillation for Diffusion models

Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen

TL;DR

This paper tackles the slow inference of diffusion models caused by large sampling step counts $T$. It introduces SFDDM, a single-fold distillation framework that constructs a forward process for a student diffusion model aligned to a $T$-step teacher by sampling a subsequence with ratio $c=T/T'$, and trains the student to match both outputs and hidden-variable distributions. The method supports distillation to arbitrary target steps $T'$, including very small ones, and extends to flexible subsequences via an increasing index set $\{\phi_0,...,\phi_{T'}\}$. Empirical results on CIFAR-10, CelebA-HQ, LSUN-Church, and LSUN-Bedroom show SFDDM achieving state-of-the-art FID scores and enabling data generation with as few as ~1% of the teacher steps, while preserving semantic consistency and meaningful interpolation between samples.

Abstract

While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.

SFDDM: Single-fold Distillation for Diffusion models

TL;DR

This paper tackles the slow inference of diffusion models caused by large sampling step counts . It introduces SFDDM, a single-fold distillation framework that constructs a forward process for a student diffusion model aligned to a -step teacher by sampling a subsequence with ratio , and trains the student to match both outputs and hidden-variable distributions. The method supports distillation to arbitrary target steps , including very small ones, and extends to flexible subsequences via an increasing index set . Empirical results on CIFAR-10, CelebA-HQ, LSUN-Church, and LSUN-Bedroom show SFDDM achieving state-of-the-art FID scores and enabling data generation with as few as ~1% of the teacher steps, while preserving semantic consistency and meaningful interpolation between samples.

Abstract

While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.
Paper Structure (25 sections, 1 theorem, 34 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 25 sections, 1 theorem, 34 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Lemma A.1

For the Markovian assumption on the forward process $q^{\prime}\left(\boldsymbol{x}^{\prime}_{1: T^{\prime}} \mid \boldsymbol{x}^{\prime}_0\right):=\prod_{t=1}^{T^{\prime}} q^{\prime}\left(\boldsymbol{x}^{\prime}_t \mid \boldsymbol{x}^{\prime}_{t-1}\right)$ of the student and $q^{\prime}\left(\bolds

Figures (11)

  • Figure 1: Single-Fold Distillation of Diffusion Model (SFDDM). The student accelerates the inference by a small number of steps $T^{\prime}$ instead of a large $T$. We use $T=9$ and $T^{\prime}=3$ in the figure for readability. To align the teacher and student Markov chains, we propose to match the intermediate hidden variables to make, e.g., $q^{\prime}(\boldsymbol{x}^{\prime}_2 = \boldsymbol{x}_6 |\boldsymbol{x}^{\prime}_0 = \boldsymbol{x}_0$) equal to $q(\boldsymbol{x}_6|\boldsymbol{x}_0)$.
  • Figure 2: FID under different number of sampling steps from the teacher $T=1024$, on four datasets.
  • Figure 3: FID of the methods with different number of sampling steps on CelebA-HQ.
  • Figure 4: Generated samples from SFDDM on different $T^{\prime}$.
  • Figure 5: Consistency on CelebA-HQ, LSUN-Bedroom and LSUN-Church: inputing the same noise.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Lemma A.1
  • proof
  • proof