Sublinear iterations can suffice even for DDPMs
Matthew S. Zhang, Stephen Huan, Jerry Huang, Nicholas M. Boffi, Sitan Chen, Sinho Chewi
TL;DR
Sublinear iterations can suffice for diffusion-model sampling. The authors introduce the denoising diffusion randomized midpoint method (DDRaM) and a shifted composition framework to discretize the DDPM reverse process more efficiently, achieving a bound of $\widetilde{O}(\sqrt{d}/\varepsilon)$ score evaluations under mild smoothness and score-estimation assumptions. Their analysis provides KL guarantees to a target distribution while preserving the standard DDPM dynamics, with empirical validation on pre-trained image models showing competitive or superior performance to common solvers. The work also demonstrates applicability to different diffusion parameterizations (VP/VE/EDM) and discusses extensions to broader diffusion-model settings. Overall, this work bridges theoretical sublinear guarantees with practical DDPM sampling, enabling faster high-dimensional generation without altering the canonical sampler.
Abstract
SDE-based methods such as denoising diffusion probabilistic models (DDPMs) have shown remarkable success in real-world sample generation tasks. Prior analyses of DDPMs have been focused on the exponential Euler discretization, showing guarantees that generally depend at least linearly on the dimension or initial Fisher information. Inspired by works in log-concave sampling (Shen and Lee, 2019), we analyze an integrator -- the denoising diffusion randomized midpoint method (DDRaM) -- that leverages an additional randomized midpoint to better approximate the SDE. Using a recently-developed analytic framework called the "shifted composition rule", we show that this algorithm enjoys favorable discretization properties under appropriate smoothness assumptions, with sublinear $\widetilde{O}(\sqrt{d})$ score evaluations needed to ensure convergence. This is the first sublinear complexity bound for pure DDPM sampling -- prior works which obtained such bounds worked instead with ODE-based sampling and had to make modifications to the sampler which deviate from how they are used in practice. We also provide experimental validation of the advantages of our method, showing that it performs well in practice with pre-trained image synthesis models.
