Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue; Zhaoqiang Liu; Fei Chen; Shifeng Zhang; Tianyang Hu; Enze Xie; Zhenguo Li

Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li

TL;DR

The paper tackles the bottleneck of sampling efficiency in diffusion probabilistic models by optimizing the time-step schedule for high-order ODE solvers. It presents a general, training-free framework that formulates an optimization over the sampling times, solvable efficiently with a constrained trust-region method, and applicable to solvers like UniPC and DPM-Solver++. The approach yields substantial FID improvements across pixel-space and latent-space diffusion models on CIFAR-10, ImageNet, FFHQ, and AFHQv2, even when using as few as five neural function evaluations. This work enables faster, high-quality diffusion sampling with minimal overhead and broad plug-and-play applicability, making diffusion-based generation more practical for real-world deployment.

Abstract

Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps. To address this issue, we propose a general framework for designing an optimization problem that seeks more appropriate time steps for a specific numerical ODE solver for DPMs. This optimization problem aims to minimize the distance between the ground-truth solution to the ODE and an approximate solution corresponding to the numerical solver. It can be efficiently solved using the constrained trust region method, taking less than $15$ seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.

Accelerating Diffusion Sampling with Optimized Time Steps

TL;DR

Abstract

seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.

Paper Structure (25 sections, 2 theorems, 26 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 2 theorems, 26 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Training-based Methods
Adaptive Step Size
Learning to Schedule
Preliminary
Diffusion Models
Discretization Schemes
Problem Formulation
Analysis and Method
Experiments
Pixel Diffusion Model Generation
Results on CIFAR-10 32x32
Results on ImageNet 64x64
Results on FFHQ 64x64 and AFHQv2 64x64
...and 10 more sections

Key Result

Lemma 1

For any $\mathbf{x}_0 \sim q_0$ and $P_0 \in (0,1)$, with probability at least $1-P_0$, the following event occurs: For all $t \in \{t_0, t_1,\ldots,t_N\}$ and $\mathbf{x}_t \sim q_t$, we have where $\tilde{\eta} := \sqrt{\frac{N+1}{P_0}} \eta$ and $\tilde{\varepsilon}_t := \frac{\varepsilon_t\sigma_t^2}{\alpha_t}$.

Figures (8)

Figure 1: Sampling quality measured by FID ($\downarrow$) of different discretization schemes of time steps for UniPC zhao2023unipc with varying NFEs on various DPMs and datasets.
Figure 2: Generated images by UniPC zhao2023unipc with only 5 NFEs for various discretization schemes of time steps from DiT-XL-2 ImageNet 256x256 model peebles2022scalable (with cfg scale $s=1.5$ and the same random seed).
Figure 3: Sampling quality measured by FID ($\downarrow$) of different discretization schemes of time steps for UniPC zhao2023unipc with varying NFEs on MS-COCO 256x256 using PixArt-$\alpha$-256 model chen2023pixartalpha (with cfg scale $s=2.5$).
Figure 4: Generated images by UniPC zhao2023unipc with only 5 NFEs for various discretization schemes of time steps from DiT-XL-2 ImageNet 256x256 model peebles2022scalable (with cfg scale $s=1.5$ and the same random seed).
Figure 5: Generated images by UniPC zhao2023unipc with only 5 NFEs for various discretization schemes of time steps from PixArt-$\alpha$-512 model chen2023pixartalpha (with cfg scale $s=2.5$ and the same random seed).
...and 3 more figures

Theorems & Definitions (6)

Remark 1
Remark 2
Lemma 1
proof
Theorem 1
proof

Accelerating Diffusion Sampling with Optimized Time Steps

TL;DR

Abstract

Accelerating Diffusion Sampling with Optimized Time Steps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)