SFDDM: Single-fold Distillation for Diffusion models

Chi Hong; Jiyue Huang; Robert Birke; Dick Epema; Stefanie Roos; Lydia Y. Chen

SFDDM: Single-fold Distillation for Diffusion models

Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen

TL;DR

This paper tackles the slow inference of diffusion models caused by large sampling step counts $T$. It introduces SFDDM, a single-fold distillation framework that constructs a forward process for a student diffusion model aligned to a $T$-step teacher by sampling a subsequence with ratio $c=T/T'$, and trains the student to match both outputs and hidden-variable distributions. The method supports distillation to arbitrary target steps $T'$, including very small ones, and extends to flexible subsequences via an increasing index set $\{\phi_0,...,\phi_{T'}\}$. Empirical results on CIFAR-10, CelebA-HQ, LSUN-Church, and LSUN-Bedroom show SFDDM achieving state-of-the-art FID scores and enabling data generation with as few as ~1% of the teacher steps, while preserving semantic consistency and meaningful interpolation between samples.

Abstract

While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.

SFDDM: Single-fold Distillation for Diffusion models

TL;DR

This paper tackles the slow inference of diffusion models caused by large sampling step counts

. It introduces SFDDM, a single-fold distillation framework that constructs a forward process for a student diffusion model aligned to a

-step teacher by sampling a subsequence with ratio

, and trains the student to match both outputs and hidden-variable distributions. The method supports distillation to arbitrary target steps

, including very small ones, and extends to flexible subsequences via an increasing index set

. Empirical results on CIFAR-10, CelebA-HQ, LSUN-Church, and LSUN-Bedroom show SFDDM achieving state-of-the-art FID scores and enabling data generation with as few as ~1% of the teacher steps, while preserving semantic consistency and meaningful interpolation between samples.

Abstract

Paper Structure (25 sections, 1 theorem, 34 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 25 sections, 1 theorem, 34 equations, 11 figures, 3 tables, 2 algorithms.

Introduction
Related studies
Single-fold distillation
Preliminary
Single-fold Distilled Diffusion (SFDDM)
The forward process of the student model
The reverse process of the student model
Distillation procedure
Distillation on flexible sub-sequence
Evaluation
Sampling quality and efficiency
Distillation with different sub-sequences
Consistency between teacher and student
Interpolation on the teacher and the student
Limitations
...and 10 more sections

Key Result

Lemma A.1

For the Markovian assumption on the forward process $q^{\prime}\left(\boldsymbol{x}^{\prime}_{1: T^{\prime}} \mid \boldsymbol{x}^{\prime}_0\right):=\prod_{t=1}^{T^{\prime}} q^{\prime}\left(\boldsymbol{x}^{\prime}_t \mid \boldsymbol{x}^{\prime}_{t-1}\right)$ of the student and $q^{\prime}\left(\bolds

Figures (11)

Figure 1: Single-Fold Distillation of Diffusion Model (SFDDM). The student accelerates the inference by a small number of steps $T^{\prime}$ instead of a large $T$. We use $T=9$ and $T^{\prime}=3$ in the figure for readability. To align the teacher and student Markov chains, we propose to match the intermediate hidden variables to make, e.g., $q^{\prime}(\boldsymbol{x}^{\prime}_2 = \boldsymbol{x}_6 |\boldsymbol{x}^{\prime}_0 = \boldsymbol{x}_0$) equal to $q(\boldsymbol{x}_6|\boldsymbol{x}_0)$.
Figure 2: FID under different number of sampling steps from the teacher $T=1024$, on four datasets.
Figure 3: FID of the methods with different number of sampling steps on CelebA-HQ.
Figure 4: Generated samples from SFDDM on different $T^{\prime}$.
Figure 5: Consistency on CelebA-HQ, LSUN-Bedroom and LSUN-Church: inputing the same noise.
...and 6 more figures

Theorems & Definitions (3)

Lemma A.1
proof
proof

SFDDM: Single-fold Distillation for Diffusion models

TL;DR

Abstract

SFDDM: Single-fold Distillation for Diffusion models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)