AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse
Zichao Yu, Zhen Zou, Guojiang Shao, Chengwei Zhang, Shengze Xu, Jie Huang, Feng Zhao, Xiaodong Cun, Wenyi Zhang
TL;DR
This paper tackles the slow inference of diffusion models by introducing AB-Cache, a training-free caching method grounded in Adams-Bashforth numerical integration. It establishes a theoretical link showing a linear relationship and a U-shaped similarity between outputs of adjacent denoising steps, with an $O(h^k)$ truncation error for k-th order schemes. The proposed method generalizes caching to high-order linear approximations across multiple prior steps, enabling efficient, architecture-agnostic acceleration across image and video diffusion models. Extensive experiments across models, schedulers, and tasks validate nearly 3x speedups while maintaining generation quality, demonstrating practical applicability for real-time diffusion-based generation.
Abstract
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to performance degradation. In this work, we provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method, revealing a linear relationship between the outputs of consecutive steps. This analysis explains why the outputs of adjacent steps exhibit a U-shaped pattern. Furthermore, extending Adams-Bashforth method to higher order, we propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results, with a truncation error bound of only \(O(h^k)\) where $h$ is the step size. Extensive validation across diverse image and video diffusion models (including HunyuanVideo and FLUX.1-dev) with various schedulers demonstrates our method's effectiveness in achieving nearly $3\times$ speedup while maintaining original performance levels, offering a practical real-time solution without compromising generation quality.
