Table of Contents
Fetching ...

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

Yifei Wang, Weimin Bai, Colin Zhang, Debing Zhang, Weijian Luo, He Sun

TL;DR

Uni-Instruct introduces a diffusion-variance framework that unifies more than 10 one-step diffusion distillation methods via a diffusion expansion of the $f$-divergence. It derives tractable, gradient-equivalent losses by combining DI and SIM components and estimating density ratios with a GAN-based discriminator, enabling practical training of one-step generators. The framework recovers prior methods as special cases and delivers state-of-the-art one-step FID scores on CIFAR-10 and ImageNet-$64\times 64$, as well as plausible text-to-3D results. By bridging KL-based and score-based divergences, Uni-Instruct provides a principled path toward efficient diffusion distillation and broader knowledge transfer across diffusion models.

Abstract

In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

TL;DR

Uni-Instruct introduces a diffusion-variance framework that unifies more than 10 one-step diffusion distillation methods via a diffusion expansion of the -divergence. It derives tractable, gradient-equivalent losses by combining DI and SIM components and estimating density ratios with a GAN-based discriminator, enabling practical training of one-step generators. The framework recovers prior methods as special cases and delivers state-of-the-art one-step FID scores on CIFAR-10 and ImageNet-, as well as plausible text-to-3D results. By bridging KL-based and score-based divergences, Uni-Instruct provides a principled path toward efficient diffusion distillation and broader knowledge transfer across diffusion models.

Abstract

In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, -distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the -divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded -divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded -divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet- generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

Paper Structure

This paper contains 51 sections, 7 theorems, 54 equations, 10 figures, 9 tables, 2 algorithms.

Key Result

Theorem 3.1

Assume $p, q$ are distributions that both evolve along Eq. equ:forwardSDE. We have the following equivalence:

Figures (10)

  • Figure 1: Left: Conception overview of Uni-Instruct. The Uni-Instruct unifies more than 10 existing diffusion distillation methods in a wide range of applications. Please check Table \ref{['TAB:TEASER']} for details. Right: selected FID scores of different models on ImageNet-$64\times 64$ conditional generation benchmark.
  • Figure 2: Generated samples from Uni-Instruct one-step generators that are distilled from pre-trained diffusion models on different datasets. Left: CIFAR10 (unconditional); Mid: CIFAR10 (conditional); Right: ImageNet $64\times64$ (conditional)
  • Figure 2: Comparison image generation on CIFAR-10 (unconditional). The best one/few-step generator under the FID metric is highlighted with bold. F.S. means from scratch. L.T. means resume and Longer Training.
  • Figure 3: Prompt: A refined vase with artistic patterns. Left: ProlificDreamer; Right: Uni-Instruct (forward KL). Our vase demonstrates more diverse shapes as well as realistic patterns.
  • Figure 4: Prompt: A refined vase with artistic patterns. From top to bottom : ProlificDreamer, Uni-Instruct (Forward-KL), Uni-Instruct (Reverse-KL). Our vase demonstrates more diverse shapes as well as realistic patterns.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Theorem 3.1: Diffusion Expansion of $f$-Divergence
  • Theorem 3.2: Gradient Equality Theorem of the Expanded $f$-divergence
  • Remark 3.3
  • Corollary 3.4
  • Lemma B.1: Calculate the gradient of $\bm{x} \sim p_{\theta,t}$ xu2025one
  • Lemma B.2: Calculate the gradient of the score fuction luo2025one
  • Theorem B.3: Density Ratio Representation
  • Lemma B.4: Optimal Discriminator Characterization