Table of Contents
Fetching ...

Optimal Stepsize for Diffusion Sampling

Jianning Pei, Han Hu, Shuyang Gu

TL;DR

Diffusion sampling discretization creates a speed–fidelity trade-off that worsens with few steps. The paper introduces Optimal Stepsize Distillation (OSS), a dynamic-programming framework that distills a theoretically optimal stepsize schedule from a high-step teacher to a low-step student, minimizing global error under a fixed step budget $M$ and providing stability across architectures and solvers. OSS is architecture-agnostic, compatible with various direction strategies, and augmented by per-step amplitude calibration to mitigate low-step drift; it achieves up to $10\times$ acceleration with minimal loss of fidelity on GenEval benchmarks. This work offers a practical, plug-and-play pathway to deploy latency-efficient diffusion inference without re-training, by decoupling stepsize design from denoising directions and leveraging principled DP-based optimization.

Abstract

Diffusion models achieve remarkable generation quality but suffer from computational intensive sampling due to suboptimal step discretization. While existing works focus on optimizing denoising directions, we address the principled design of stepsize schedules. This paper proposes Optimal Stepsize Distillation, a dynamic programming framework that extracts theoretically optimal schedules by distilling knowledge from reference trajectories. By reformulating stepsize optimization as recursive error minimization, our method guarantees global discretization bounds through optimal substructure exploitation. Crucially, the distilled schedules demonstrate strong robustness across architectures, ODE solvers, and noise schedules. Experiments show 10x accelerated text-to-image generation while preserving 99.4% performance on GenEval. Our code is available at https://github.com/bebebe666/OptimalSteps.

Optimal Stepsize for Diffusion Sampling

TL;DR

Diffusion sampling discretization creates a speed–fidelity trade-off that worsens with few steps. The paper introduces Optimal Stepsize Distillation (OSS), a dynamic-programming framework that distills a theoretically optimal stepsize schedule from a high-step teacher to a low-step student, minimizing global error under a fixed step budget and providing stability across architectures and solvers. OSS is architecture-agnostic, compatible with various direction strategies, and augmented by per-step amplitude calibration to mitigate low-step drift; it achieves up to acceleration with minimal loss of fidelity on GenEval benchmarks. This work offers a practical, plug-and-play pathway to deploy latency-efficient diffusion inference without re-training, by decoupling stepsize design from denoising directions and leveraging principled DP-based optimization.

Abstract

Diffusion models achieve remarkable generation quality but suffer from computational intensive sampling due to suboptimal step discretization. While existing works focus on optimizing denoising directions, we address the principled design of stepsize schedules. This paper proposes Optimal Stepsize Distillation, a dynamic programming framework that extracts theoretically optimal schedules by distilling knowledge from reference trajectories. By reformulating stepsize optimization as recursive error minimization, our method guarantees global discretization bounds through optimal substructure exploitation. Crucially, the distilled schedules demonstrate strong robustness across architectures, ODE solvers, and noise schedules. Experiments show 10x accelerated text-to-image generation while preserving 99.4% performance on GenEval. Our code is available at https://github.com/bebebe666/OptimalSteps.

Paper Structure

This paper contains 29 sections, 1 theorem, 24 equations, 14 figures, 7 tables, 2 algorithms.

Key Result

Lemma 3.1

The optimal m step denosing results of student solver $z[m]$ always derives from the optimal m-1 step results $z[m-1]$ with additional one step denosing.

Figures (14)

  • Figure 1: Flux sampling results using different stepsize schedules. Left: Original sampling result using $100$ steps. Middle: Optimal stepsize sampling result within $10$ steps. Right: Naively reducing sampling steps to $10$.
  • Figure 2: Two key factors in diffusion sampling: direction strategy(left) and stepsize strategy(right).
  • Figure 3: Subtask illustration of the recursive subtasks. The optimal results at timestep $j$ using $i$ step denosing ($z[i][j]$) derives from the $i-1$ step optimal denosing results($z[i-1]$).
  • Figure 4: Amplitude of input tensor throughout denoising steps. The left and right plot the quantile of 5%, and 95% respectively.
  • Figure 5: Step schedule for different solvers. Our method achieves nearly identical results from different teacher model steps.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Lemma 3.1
  • proof