Table of Contents
Fetching ...

Sublinear iterations can suffice even for DDPMs

Matthew S. Zhang, Stephen Huan, Jerry Huang, Nicholas M. Boffi, Sitan Chen, Sinho Chewi

TL;DR

Sublinear iterations can suffice for diffusion-model sampling. The authors introduce the denoising diffusion randomized midpoint method (DDRaM) and a shifted composition framework to discretize the DDPM reverse process more efficiently, achieving a bound of $\widetilde{O}(\sqrt{d}/\varepsilon)$ score evaluations under mild smoothness and score-estimation assumptions. Their analysis provides KL guarantees to a target distribution while preserving the standard DDPM dynamics, with empirical validation on pre-trained image models showing competitive or superior performance to common solvers. The work also demonstrates applicability to different diffusion parameterizations (VP/VE/EDM) and discusses extensions to broader diffusion-model settings. Overall, this work bridges theoretical sublinear guarantees with practical DDPM sampling, enabling faster high-dimensional generation without altering the canonical sampler.

Abstract

SDE-based methods such as denoising diffusion probabilistic models (DDPMs) have shown remarkable success in real-world sample generation tasks. Prior analyses of DDPMs have been focused on the exponential Euler discretization, showing guarantees that generally depend at least linearly on the dimension or initial Fisher information. Inspired by works in log-concave sampling (Shen and Lee, 2019), we analyze an integrator -- the denoising diffusion randomized midpoint method (DDRaM) -- that leverages an additional randomized midpoint to better approximate the SDE. Using a recently-developed analytic framework called the "shifted composition rule", we show that this algorithm enjoys favorable discretization properties under appropriate smoothness assumptions, with sublinear $\widetilde{O}(\sqrt{d})$ score evaluations needed to ensure convergence. This is the first sublinear complexity bound for pure DDPM sampling -- prior works which obtained such bounds worked instead with ODE-based sampling and had to make modifications to the sampler which deviate from how they are used in practice. We also provide experimental validation of the advantages of our method, showing that it performs well in practice with pre-trained image synthesis models.

Sublinear iterations can suffice even for DDPMs

TL;DR

Sublinear iterations can suffice for diffusion-model sampling. The authors introduce the denoising diffusion randomized midpoint method (DDRaM) and a shifted composition framework to discretize the DDPM reverse process more efficiently, achieving a bound of score evaluations under mild smoothness and score-estimation assumptions. Their analysis provides KL guarantees to a target distribution while preserving the standard DDPM dynamics, with empirical validation on pre-trained image models showing competitive or superior performance to common solvers. The work also demonstrates applicability to different diffusion parameterizations (VP/VE/EDM) and discusses extensions to broader diffusion-model settings. Overall, this work bridges theoretical sublinear guarantees with practical DDPM sampling, enabling faster high-dimensional generation without altering the canonical sampler.

Abstract

SDE-based methods such as denoising diffusion probabilistic models (DDPMs) have shown remarkable success in real-world sample generation tasks. Prior analyses of DDPMs have been focused on the exponential Euler discretization, showing guarantees that generally depend at least linearly on the dimension or initial Fisher information. Inspired by works in log-concave sampling (Shen and Lee, 2019), we analyze an integrator -- the denoising diffusion randomized midpoint method (DDRaM) -- that leverages an additional randomized midpoint to better approximate the SDE. Using a recently-developed analytic framework called the "shifted composition rule", we show that this algorithm enjoys favorable discretization properties under appropriate smoothness assumptions, with sublinear score evaluations needed to ensure convergence. This is the first sublinear complexity bound for pure DDPM sampling -- prior works which obtained such bounds worked instead with ODE-based sampling and had to make modifications to the sampler which deviate from how they are used in practice. We also provide experimental validation of the advantages of our method, showing that it performs well in practice with pre-trained image synthesis models.

Paper Structure

This paper contains 39 sections, 11 theorems, 97 equations, 5 figures, 2 algorithms.

Key Result

Theorem 1

Let $\varepsilon > 0$, and let $\pi$ be a data distribution over $\mathbb{R}^d$ with bounded second moment. Suppose we have estimates $(\mathsf{s}_t)$ for its scores $(\nabla \log \pi_t)$ along the Ornstein--Uhlenbeck process that are $\widetilde{O}(\varepsilon)$-accurate in $L^2(\pi_t)$ and $L_t$-L

Figures (5)

  • Figure 5.1: Qualitative baseline comparison. Listing from left to right, we show a qualitative comparison between the Euler--Maruyama sampler \ref{['eq:euler_maru']}, the Euler exponential integrator \ref{['eq:exp_integrator_exact']}, and \ref{['eq:rmd-alg']} on the AFHQv2 dataset choi2020starganv2. All samplers use 64 score function evaluations (64 Euler integration steps, 32 midpoint steps) and leverage the EDM pre-trained unconditional VP model from karras2022elucidating at $64 \times 64$ resolution over the OU process \ref{['eq:OU-reverse']}. Clearly \ref{['eq:rmd-alg']} attains the best visual performance, which we quantify in \ref{['fig:sde_ou']}.
  • Figure 5.2: Quantitative baseline comparison. Image quality measured by FID (top) and $\text{FD}_{\text{DINOv2}}$ (bottom) versus number of score function evaluations (NFEs) for the \ref{['eq:euler_maru']}, \ref{['eq:exp_integrator_exact']}, and \ref{['eq:randomized-midpoint']} methods run on the OU process. Supporting \ref{['fig:ou']}, \ref{['eq:randomized-midpoint']} obtains the best quantitative results.
  • Figure 5.3: Quantitative results: Deterministic sampling. Image quality measured by Fréchet inception distance (FID$\downarrow$) with number of score function evaluations (NFEs) for the Euler, Heun, and randomized midpoint methods. Columns correspond to the VP, VE, and EDM processes. For $n$ steps of the solver, Euler takes $n$ NFEs, Heun takes $2 n - 1$ (since karras2022elucidating run Euler on the last step to avoid the singularity at 0), and \ref{['eq:randomized-midpoint']} takes $2 n$. As a result, \ref{['eq:randomized-midpoint']} has one extra NFE compared to Euler and Heun in these plots. \ref{['fig:ode_fd_dinov2']} measures using $\text{FD}_{\text{DINOv2}}$ on the same images and shows similar results. \ref{['fig:ode_fid_combine']} shows the NFE curves on a shared $y$-axis.
  • Figure C.1: Image quality as measured by $\text{FD}_{\text{DINOv2}}$. \ref{['fig:ode_fid']} uses the same generated images.
  • Figure C.2: A variant of \ref{['fig:ode_fid']} with all methods and settings shown on the same scale.

Theorems & Definitions (11)

  • Theorem 1: Informal, see Theorem \ref{['thm:main-varying']}
  • Lemma 2
  • Theorem 3: Main result
  • Lemma 4: Magic lemma I, adapted from conforti2025kl
  • Lemma 5: Magic lemma II, adapted from conforti2025kl
  • Lemma 6
  • Lemma 7: Pointwise local errors
  • Lemma 8: Score estimator bounds
  • Lemma 9: Local errors
  • Lemma 10: Properties of \ref{['eq:randomized-midpoint']}
  • ...and 1 more