Table of Contents
Fetching ...

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration

Benlei Cui, Shaoxuan He, Bukun Huang, Zhizeng Ye, Yunyun Sun, Longtao Huang, Hui Xue, Yang Yang, Jingqun Tang, Zhou Zhao, Haiwen Hong

TL;DR

TC-Pad\'e incorporates adaptive coefficient modulation that leverages historical cached residuals to detect subtle trajectory transitions, and step-aware prediction strategies tailored to the distinct dynamics of early, mid, and late sampling stages, substantially outperforming existing feature caching methods.

Abstract

Despite achieving state-of-the-art generation quality, diffusion models are hindered by the substantial computational burden of their iterative sampling process. While feature caching techniques achieve effective acceleration at higher step counts (e.g., 50 steps), they exhibit critical limitations in the practical low-step regime of 20-30 steps. As the interval between steps increases, polynomial-based extrapolators like TaylorSeer suffer from error accumulation and trajectory drift. Meanwhile, conventional caching strategies often overlook the distinct dynamical properties of different denoising phases. To address these challenges, we propose Trajectory-Consistent Padé approximation, a feature prediction framework grounded in Padé approximation. By modeling feature evolution through rational functions, our approach captures asymptotic and transitional behaviors more accurately than Taylor-based methods. To enable stable and trajectory-consistent sampling under reduced step counts, TC-Padé incorporates (1) adaptive coefficient modulation that leverages historical cached residuals to detect subtle trajectory transitions, and (2) step-aware prediction strategies tailored to the distinct dynamics of early, mid, and late sampling stages. Extensive experiments on DiT-XL/2, FLUX.1-dev, and Wan2.1 across both image and video generation demonstrate the effectiveness of TC-Padé. For instance, TC-Padé achieves 2.88x acceleration on FLUX.1-dev and 1.72x on Wan2.1 while maintaining high quality across FID, CLIP, Aesthetic, and VBench-2.0 metrics, substantially outperforming existing feature caching methods.

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration

TL;DR

TC-Pad\'e incorporates adaptive coefficient modulation that leverages historical cached residuals to detect subtle trajectory transitions, and step-aware prediction strategies tailored to the distinct dynamics of early, mid, and late sampling stages, substantially outperforming existing feature caching methods.

Abstract

Despite achieving state-of-the-art generation quality, diffusion models are hindered by the substantial computational burden of their iterative sampling process. While feature caching techniques achieve effective acceleration at higher step counts (e.g., 50 steps), they exhibit critical limitations in the practical low-step regime of 20-30 steps. As the interval between steps increases, polynomial-based extrapolators like TaylorSeer suffer from error accumulation and trajectory drift. Meanwhile, conventional caching strategies often overlook the distinct dynamical properties of different denoising phases. To address these challenges, we propose Trajectory-Consistent Padé approximation, a feature prediction framework grounded in Padé approximation. By modeling feature evolution through rational functions, our approach captures asymptotic and transitional behaviors more accurately than Taylor-based methods. To enable stable and trajectory-consistent sampling under reduced step counts, TC-Padé incorporates (1) adaptive coefficient modulation that leverages historical cached residuals to detect subtle trajectory transitions, and (2) step-aware prediction strategies tailored to the distinct dynamics of early, mid, and late sampling stages. Extensive experiments on DiT-XL/2, FLUX.1-dev, and Wan2.1 across both image and video generation demonstrate the effectiveness of TC-Padé. For instance, TC-Padé achieves 2.88x acceleration on FLUX.1-dev and 1.72x on Wan2.1 while maintaining high quality across FID, CLIP, Aesthetic, and VBench-2.0 metrics, substantially outperforming existing feature caching methods.
Paper Structure (21 sections, 9 equations, 13 figures, 8 tables)

This paper contains 21 sections, 9 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Visual quality comparison on FLUX.1-dev with 50 steps (left) and 20 steps (right). Numbers indicate speedup ratios relative to the original sampling. Previous cache-based methods exhibit notable quality degradation with altered textures and colors under low-step settings. In contrast, TC-Padé preserves visual fidelity while achieving higher acceleration. Additional qualitative comparisons with other methods are provided in the Appendix.
  • Figure 2: PCA visualization of final layer outputs from various caching-based methods under 20 steps sampling regime. $\|{V(t)}\|$ represents the PCA results of the model's output velocity field. Data is collected from FLUX.1-dev on the DrawBench saharia2022photorealistic dataset.
  • Figure 3: Overview of TC-Padé within a cache interval $\mathcal{N}$. Here, $\mathcal{N}=4$. $\{x_{t+3},x_{t+2},x_{t+1},x_{t}\}$ and $\{y_{t+3},y_{t+2},y_{t+1},y_{t}\}$ denote the input and output of each timestep, respectively. $\{\mathcal{R}_{t+3},\mathcal{R}_{t+2},\mathcal{R}_{t+1},\mathcal{R}_{t}\}$ are the cached residuals. $\theta$ is a predefined threshold. In each cache interval, only the initial timestep performs full computation, while subsequent timesteps adaptively determine their computation mode via the Trajectory Stableness Indicator (TSI).
  • Figure 4: (a) Residual and raw feature similarity between our method and the original sampling schedule, showing that residuals have consistently higher similarity. (b) Raw feature similarity between the original sampling schedule and TaylorSeer under different settings.
  • Figure 4: Ablation study on the impact of cached residual granularity. Entire block level caching achieves optimal performance-efficiency trade-off.
  • ...and 8 more figures