Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Mingyuan Zhou; Huangjie Zheng; Zhendong Wang; Mingzhang Yin; Hai Huang

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

TL;DR

SiD tackles the challenge of turning pretrained diffusion models into a fast, data-free, one-step generator. It reframes forward diffusion as a semi-implicit distribution and derives three score identities to enable tractable, score-based distillation via a model-based Fisher divergence loss. By introducing MESM and carefully crafted approximations, SiD trains a generator to replicate the teacher’s score behavior using only synthesized images, achieving exponential FID reductions and often surpassing the teacher across multiple datasets with a single generation step. The result is a highly efficient diffusion-distillation pipeline with strong generation quality and broad applicability, albeit with increased memory requirements due to the additional score networks.

Abstract

We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

TL;DR

Abstract

Paper Structure (19 sections, 3 theorems, 44 equations, 18 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 3 theorems, 44 equations, 18 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Forward Diffusion as Semi-Implicit Distribution: Exploring Score Identities
Forward Diffusions and Semi-Implicit Distributions
Score Identities
SiD: Score identity Distillation
Model-based Explicit Score Matching (MESM)
Loss Approximation based on Identities 1 and 2
Loss Approximation via Projected Score Matching
Fused Loss of SiD
Noise Weighting and Scheduling
Experimental Results
Benchmark Performance
Conclusion
Ablation Study and Parameter Settings
...and 4 more sections

Key Result

Proposition 4

Suppose $p_\text{data}(x_0) = \mathcal{N}(0,1)$, $p_{data}(x_t)=\mathcal{N}(0,1+\sigma_t^2)$, $q(x_t\,|\, x_g) = \mathcal{N}(x_g,\sigma_t^2)$, and $p_\theta(x_g) = \mathcal{N}(\theta,1)$. Assume $\psi^*(\theta)=\theta$ and $f_{\psi}(x_t,t) = x_t(1+\sigma_t^2)^{-1} + \psi \sigma_t^2(1+\sigma_t^2)^{-1

Figures (18)

Figure 1: Rapid advancements in the distillation of a pretrained ImageNet 64x64 diffusion model are shown using the proposed SiD method, with settings $\alpha=1.0$, a batch size of 1024, and a learning rate of 5e-6. The series of images, generated from the same set of random noises post-training the SiD generator with varying counts of synthesized images, illustrates progressions at 0, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, and 50 million images. These are equivalent to roughly 0, 100, 200, 500, 1K, 2K, 5K, 10K, 20K, and 49K training iterations respectively, organized from the top left to the bottom right. The associated FIDs for these iterations are 153.52, 34.83, 37.42, 18.08, 10.82, 7.74, 5.94, 4.49, 3.40, and 3.07, in order. The progression of FIDs is detailed in Fig. \ref{['fig:imagenet_1024']} in the Appendix.
Figure 1: Comparison of various deep generative models trained on CIFAR-10 without label conditioning. The best and second-best one/few-step generators under the FID or IS metric are highlighted with bold and italic bold, respectively.
Figure 2: Analogous to Table \ref{['tab:cifar10_uncond']} for CIFAR-10 (conditional).
Figure 3: Analogous to Table \ref{['tab:cifar10_uncond']} for ImageNet 64x64 with label conditioning. The Precision and Recall metrics are also included.
Figure 4: Analogous to Table \ref{['tab:cifar10_uncond']} for FFHQ 64x64.
...and 13 more figures

Theorems & Definitions (6)

Proposition 4: An example failure case
Theorem 5: Projected Score Matching
Proposition 6
proof : Proof of Tweedie's formula
proof : Proof of Identity \ref{['projected_score']}.
proof : Proof of Theorem \ref{['thm:project_SM']}

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

TL;DR

Abstract

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (6)