Table of Contents
Fetching ...

Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

Haoxuan Chen, Yinuo Ren, Lexing Ying, Grant M. Rotskoff

TL;DR

The paper tackles the challenge of expensive diffusion-model inference by introducing Parallelized Inference for Diffusion Models (PIADM), which partitions the sampling horizon into a small number of blocks and performs score-function evaluations in parallel inside each block. It develops two parallelized schemes, PIADM-SDE and PIADM-ODE, leveraging exponential integrators and Picard iterations to achieve poly-logarithmic time in the data dimension, with controlled discretization error. Theoretical guarantees show convergence to the target distribution with D_KL or TV-type bounds, and the SDE and ODE implementations offer favorable space complexities (d^2 and d^{3/2}, respectively) while maintaining sub-linear time. The results unify a rigorous probabilistic framework (via generalized Girsanov and stochastic process constructions) with practical algorithmic designs, providing a foundation for scalable, fast diffusion-based sampling on modern hardware.

Abstract

Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and \emph{evaluate}, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into $\mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ overall time complexity, marking \emph{the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$}. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.

Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

TL;DR

The paper tackles the challenge of expensive diffusion-model inference by introducing Parallelized Inference for Diffusion Models (PIADM), which partitions the sampling horizon into a small number of blocks and performs score-function evaluations in parallel inside each block. It develops two parallelized schemes, PIADM-SDE and PIADM-ODE, leveraging exponential integrators and Picard iterations to achieve poly-logarithmic time in the data dimension, with controlled discretization error. Theoretical guarantees show convergence to the target distribution with D_KL or TV-type bounds, and the SDE and ODE implementations offer favorable space complexities (d^2 and d^{3/2}, respectively) while maintaining sub-linear time. The results unify a rigorous probabilistic framework (via generalized Girsanov and stochastic process constructions) with practical algorithmic designs, providing a foundation for scalable, fast diffusion-based sampling on modern hardware.

Abstract

Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and \emph{evaluate}, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves overall time complexity, marking \emph{the first implementation with provable sub-linear complexity w.r.t. the data dimension }. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.
Paper Structure (36 sections, 25 theorems, 180 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 36 sections, 25 theorems, 180 equations, 2 figures, 2 tables, 2 algorithms.

Key Result

Theorem 3.3

Under Assumptions ass:L2acc, ass:pdata, and ass:lipNN, given the following choices of the order of the parameters and let $L_{{\bm{s}}}^2 h_n e^{\frac{7}{2}h_n} \ll 1$, $\delta_2 \lesssim \delta$, $T \lesssim \log \eta^{-1}$, the distribution $\widehat{q}_{t_N}$ that PIADM-SDE (Algorithm alg:sde) generates samples from satisfies the following error bound: with a total of $KN = \widetilde{{\mathc

Figures (2)

  • Figure 1: Illustration of PIADM-SDE/ODE. The outer iterations are divided into ${\mathcal{O}}(\log d)$ blocks of ${\mathcal{O}}(1)$ length. Within each block, the inner iterations are parallelized with $\widetilde{{\mathcal{O}}}(d)$ steps for SDE (cf. Theorem \ref{['thm:sde']}), or $\widetilde{{\mathcal{O}}}(\sqrt{d})$ for probability flow ODE implementation (cf. Theorem \ref{['thm:ode']}). The overall approximate time complexity is $KN = \widetilde{{\mathcal{O}}}({\mathrm{poly}} \log d)$. brown, green, blue, and red curves represent the computation graph at $t = t_n + \tau_{n,m}$ for $m = 1,2,M_n - 1,M_n$.
  • Figure 2: Illustration of the proof pipeline of Theorem \ref{['thm:ode']} for PIADM-ODE within the $n$-th block.

Theorems & Definitions (60)

  • Definition 2.1: Approximate time complexity
  • Remark 3.1
  • Remark 3.2
  • Theorem 3.3: Theoretical Guarantees for PIADM-SDE
  • Remark 3.4
  • Theorem 3.5: Theoretical Guarantees for PIADM-ODE
  • Theorem A.1: Properties of $f$-divergence
  • Definition A.2
  • Remark A.3
  • Theorem A.4: Girsanov's Theorem oksendal2013stochastic
  • ...and 50 more