Table of Contents
Fetching ...

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

Zichao Yu, Zhen Zou, Guojiang Shao, Chengwei Zhang, Shengze Xu, Jie Huang, Feng Zhao, Xiaodong Cun, Wenyi Zhang

TL;DR

This paper tackles the slow inference of diffusion models by introducing AB-Cache, a training-free caching method grounded in Adams-Bashforth numerical integration. It establishes a theoretical link showing a linear relationship and a U-shaped similarity between outputs of adjacent denoising steps, with an $O(h^k)$ truncation error for k-th order schemes. The proposed method generalizes caching to high-order linear approximations across multiple prior steps, enabling efficient, architecture-agnostic acceleration across image and video diffusion models. Extensive experiments across models, schedulers, and tasks validate nearly 3x speedups while maintaining generation quality, demonstrating practical applicability for real-time diffusion-based generation.

Abstract

Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to performance degradation. In this work, we provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method, revealing a linear relationship between the outputs of consecutive steps. This analysis explains why the outputs of adjacent steps exhibit a U-shaped pattern. Furthermore, extending Adams-Bashforth method to higher order, we propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results, with a truncation error bound of only \(O(h^k)\) where $h$ is the step size. Extensive validation across diverse image and video diffusion models (including HunyuanVideo and FLUX.1-dev) with various schedulers demonstrates our method's effectiveness in achieving nearly $3\times$ speedup while maintaining original performance levels, offering a practical real-time solution without compromising generation quality.

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

TL;DR

This paper tackles the slow inference of diffusion models by introducing AB-Cache, a training-free caching method grounded in Adams-Bashforth numerical integration. It establishes a theoretical link showing a linear relationship and a U-shaped similarity between outputs of adjacent denoising steps, with an truncation error for k-th order schemes. The proposed method generalizes caching to high-order linear approximations across multiple prior steps, enabling efficient, architecture-agnostic acceleration across image and video diffusion models. Extensive experiments across models, schedulers, and tasks validate nearly 3x speedups while maintaining generation quality, demonstrating practical applicability for real-time diffusion-based generation.

Abstract

Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to performance degradation. In this work, we provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method, revealing a linear relationship between the outputs of consecutive steps. This analysis explains why the outputs of adjacent steps exhibit a U-shaped pattern. Furthermore, extending Adams-Bashforth method to higher order, we propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results, with a truncation error bound of only \(O(h^k)\) where is the step size. Extensive validation across diverse image and video diffusion models (including HunyuanVideo and FLUX.1-dev) with various schedulers demonstrates our method's effectiveness in achieving nearly speedup while maintaining original performance levels, offering a practical real-time solution without compromising generation quality.

Paper Structure

This paper contains 16 sections, 2 theorems, 24 equations, 7 figures, 4 tables.

Key Result

Proposition 1

Let $h = \lambda_t - \lambda_s = \lambda_s -\lambda_o$ be step size, with some mild conditions the same as those in lu2022dpm, we can obtain:

Figures (7)

  • Figure 1: Similar to the base model, our method computes at timesteps where $t\%T == 0$ and caches states otherwise. However, we replace simple assignment with $k$-th order Adams-Bashforth numerical integration within the cache window, achieving more accurate state transitions while maintaining efficiency, as demonstrated in the K=2 case.
  • Figure 2: U-type similarity curve.
  • Figure 3: Directional consistency measured by cosine distance.
  • Figure 5: Scale Factor Variation Curve with Timestep During the Diffusion Model Inference Process
  • Figure 6: Visualization results for different acceleration methods on FLUX.1-dev model
  • ...and 2 more figures

Theorems & Definitions (5)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Remark 1