SFDDM: Single-fold Distillation for Diffusion models
Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, Lydia Y. Chen
TL;DR
This paper tackles the slow inference of diffusion models caused by large sampling step counts $T$. It introduces SFDDM, a single-fold distillation framework that constructs a forward process for a student diffusion model aligned to a $T$-step teacher by sampling a subsequence with ratio $c=T/T'$, and trains the student to match both outputs and hidden-variable distributions. The method supports distillation to arbitrary target steps $T'$, including very small ones, and extends to flexible subsequences via an increasing index set $\{\phi_0,...,\phi_{T'}\}$. Empirical results on CIFAR-10, CelebA-HQ, LSUN-Church, and LSUN-Bedroom show SFDDM achieving state-of-the-art FID scores and enabling data generation with as few as ~1% of the teacher steps, while preserving semantic consistency and meaningful interpolation between samples.
Abstract
While diffusion models effectively generate remarkable synthetic images, a key limitation is the inference inefficiency, requiring numerous sampling steps. To accelerate inference and maintain high-quality synthesis, teacher-student distillation is applied to compress the diffusion models in a progressive and binary manner by retraining, e.g., reducing the 1024-step model to a 128-step model in 3 folds. In this paper, we propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step, based on reparameterization of the intermediate inputs from the teacher model. To train the student diffusion, we minimize not only the output distance but also the distribution of the hidden variables between the teacher and student model. Extensive experiments on four datasets demonstrate that our student model trained by the proposed SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%, thus, trading off inference time. Our remarkable performance highlights that SFDDM effectively transfers knowledge in single-fold distillation, achieving semantic consistency and meaningful image interpolation.
