Table of Contents
Fetching ...

Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

Neta Shaul, Uriel Singer, Ricky T. Q. Chen, Matthew Le, Ali Thabet, Albert Pumarola, Yaron Lipman

TL;DR

Bespoke Non-Stationary (BNS) Solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines.

Abstract

This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models. BNS solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines. Compared to model distillation, BNS solvers benefit from a tiny parameter space ($<$200 parameters), fast optimization (two orders of magnitude faster), maintain diversity of samples, and in contrast to previous solver distillation approaches nearly close the gap from standard distillation methods such as Progressive Distillation in the low-medium NFE regime. For example, BNS solver achieves 45 PSNR / 1.76 FID using 16 NFE in class-conditional ImageNet-64. We experimented with BNS solvers for conditional image generation, text-to-image generation, and text-2-audio generation showing significant improvement in sample approximation (PSNR) in all.

Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

TL;DR

Bespoke Non-Stationary (BNS) Solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines.

Abstract

This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models. BNS solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines. Compared to model distillation, BNS solvers benefit from a tiny parameter space (200 parameters), fast optimization (two orders of magnitude faster), maintain diversity of samples, and in contrast to previous solver distillation approaches nearly close the gap from standard distillation methods such as Progressive Distillation in the low-medium NFE regime. For example, BNS solver achieves 45 PSNR / 1.76 FID using 16 NFE in class-conditional ImageNet-64. We experimented with BNS solvers for conditional image generation, text-to-image generation, and text-2-audio generation showing significant improvement in sample approximation (PSNR) in all.
Paper Structure (42 sections, 5 theorems, 48 equations, 12 figures, 8 tables, 2 algorithms)

This paper contains 42 sections, 5 theorems, 48 equations, 12 figures, 8 tables, 2 algorithms.

Key Result

Proposition 3.0

For every update rule $(c_i,d_i)\in \mathbb R^{i+1}\times\mathbb R^{i+1}$ of an NS solvers there a exist a pair $(a_i, b_i)\in \mathbb{R}\times\mathbb{R}^{i+1}$ so that the update rule can be equivalently written as Furthermore, if the columns of $U_i$ are linearly independent then the pair $(a_i, b_i)$ is unique.

Figures (12)

  • Figure 1: Different solvers on an FM-OT 512$\times$512 Text-to-Image model with guidance scale $2$ initiated with the same noise (from left to right): Ground truth (Adaptive RK45), BNS 16 NFE (this paper), RK-Midpoint 16 NFE, and RK-Euler 16 NFE. Note the fidelity of BNS compared to GT. The different rows correspond to the captions (top to bottom): "A husky facing the camera.", "sunflowers in a clear glass vase on a desk.", "the cat is sitting on the floor beside a pair of tennis shoes.".
  • Figure 2: Setup.
  • Figure 3: Taxonomy of ODE solvers used for sampling of diffusion/flow generative models.
  • Figure 4: BNS solvers vs. BST solvers, RK-Midpoint/Euler, DDIM, DDIM, and DPM++ on ImageNet-64, and Image-Net128: PSNR vs. NFE (top row), and FID vs. NFE (bottom row).
  • Figure 5: BNS vs. RK-Midpoint on latent FM-OT Text-to-Image 512x512: (left) guidance scale $2.0$ with the caption "a building is shown behind trees and shrubs.", (right) guidance scale $6.5$ with the "panda bear sitting in tree with no leaves."
  • ...and 7 more figures

Theorems & Definitions (8)

  • Proposition 3.0
  • Theorem 3.1: Solver Taxonomy
  • Proposition 1.0
  • proof : Proof of proposition \ref{['prop:ns_param']}
  • Lemma 2.1
  • proof : Proof of lemma \ref{['lem:st_subsume_genric_ei']}
  • Theorem 2.1: Solver Taxonomy
  • proof : Proof of lemma \ref{['thm:universality']}.