Table of Contents
Fetching ...

Accelerating Parallel Sampling of Diffusion Models

Zhiwei Tang, Jiasheng Tang, Hao Luo, Fan Wang, Tsung-Hui Chang

TL;DR

This paper addresses the slow sampling of diffusion models caused by autoregressive evaluation of the score network. It reformulates sampling as solving a system of triangular nonlinear equations via fixed-point iteration and introduces ParaTAA, a training-free universal parallel sampler that leverages extra compute to dramatically reduce inference steps (about 4–14×) while preserving outputs close to sequential sampling. Core innovations include transforming the fixed-point landscape to accelerate convergence, a Triangular Anderson Acceleration variant tailored for triangular systems, and practical tricks like early stopping and initialization from existing trajectories, with theoretical safeguards for convergence. The approach yields substantial practical speedups on large models such as Stable Diffusion and DiT, enabling near-sequential-quality images at far fewer parallel steps and suggesting broad applicability to other autoregressive diffusion and video generation tasks.

Abstract

Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 4$\sim$14 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps. The code is available at https://github.com/TZW1998/ParaTAA-Diffusion.

Accelerating Parallel Sampling of Diffusion Models

TL;DR

This paper addresses the slow sampling of diffusion models caused by autoregressive evaluation of the score network. It reformulates sampling as solving a system of triangular nonlinear equations via fixed-point iteration and introduces ParaTAA, a training-free universal parallel sampler that leverages extra compute to dramatically reduce inference steps (about 4–14×) while preserving outputs close to sequential sampling. Core innovations include transforming the fixed-point landscape to accelerate convergence, a Triangular Anderson Acceleration variant tailored for triangular systems, and practical tricks like early stopping and initialization from existing trajectories, with theoretical safeguards for convergence. The approach yields substantial practical speedups on large models such as Stable Diffusion and DiT, enabling near-sequential-quality images at far fewer parallel steps and suggesting broad applicability to other autoregressive diffusion and video generation tasks.

Abstract

Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 414 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps. The code is available at https://github.com/TZW1998/ParaTAA-Diffusion.
Paper Structure (24 sections, 3 theorems, 26 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 3 theorems, 26 equations, 15 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.2

The nonlinear equations p:orderk with different orders $k$ are all equivalent and possess a unique solution.

Figures (15)

  • Figure 1: Convergence of residuals under different orders. x-axis is the iteration steps while y-axis is the value of $\sum_{t=1}^T r_{t-1}$.
  • Figure 2: Convergence of FP, AA, TAA under different $k$.
  • Figure 3: Comparison of parallel sampling methods and sequential sampling across various scenarios. The x-axis for all plots represents the maximum number of steps, $s_{\max}$. The first two columns from the left show the FID and IS scores for the DiT model, respectively, while the third column depicts the CS for the SD model. The rows, from top to bottom, correspond to the scenarios with DDIM 25 steps, DDIM 50 steps, DDIM 100 steps, and DDPM 100 steps, respectively. For visual examples of generated images related to these results, please refer to Appendix \ref{['app:generated_image']}.
  • Figure 4: Convergence of ParaTAA under different window sizes. The x-axis and y-axis are the same as Figure \ref{['fig:main-result']}
  • Figure 5: Iterations of ParaTAA with different initializations. The rows from top to bottom shows: 1. Sampling with P1 with random initialization; 2. Sampling with P2 with random initialization. 3. Sampling with P2 with trajectory of P1 as initialization and $T_{\text{init}}=50$. 4. Same as 3 except that $T_{\text{init}}=35$. For optimal viewing, please zoom in on the figure.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Definition 2.1: $k$-th order nonlinear equations
  • Theorem 2.2
  • Remark 2.3
  • Remark 2.4
  • Definition 3.1: Block Upper Triangular Matrix
  • Theorem 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Theorem 3.6
  • ...and 6 more