Accelerating Parallel Sampling of Diffusion Models
Zhiwei Tang, Jiasheng Tang, Hao Luo, Fan Wang, Tsung-Hui Chang
TL;DR
This paper addresses the slow sampling of diffusion models caused by autoregressive evaluation of the score network. It reformulates sampling as solving a system of triangular nonlinear equations via fixed-point iteration and introduces ParaTAA, a training-free universal parallel sampler that leverages extra compute to dramatically reduce inference steps (about 4–14×) while preserving outputs close to sequential sampling. Core innovations include transforming the fixed-point landscape to accelerate convergence, a Triangular Anderson Acceleration variant tailored for triangular systems, and practical tricks like early stopping and initialization from existing trajectories, with theoretical safeguards for convergence. The approach yields substantial practical speedups on large models such as Stable Diffusion and DiT, enabling near-sequential-quality images at far fewer parallel steps and suggesting broad applicability to other autoregressive diffusion and video generation tasks.
Abstract
Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 4$\sim$14 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps. The code is available at https://github.com/TZW1998/ParaTAA-Diffusion.
