Table of Contents
Fetching ...

AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers

Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu

TL;DR

Diffusion Transformers incur high inference costs due to dense per-layer computations across timesteps. AdaCorrection introduces an adaptive, training-free framework that detects spatio-temporal drift in cached activations via an Offset Estimation Module (OEM) and corrects it with an Adaptive Correction Module (ACM) by interpolating between cached and fresh activations using a per-layer weight $\lambda_t^{\ell}$. The method provides theoretical bounds on correction error, analyzes computational complexity, and demonstrates strong, consistent gains across multiple backbones and datasets while preserving throughput. Overall, AdaCorrection shifts the quality-speed Pareto frontier toward higher fidelity with minimal overhead, enabling practical, plug-and-play acceleration for diffusion-based image and video generation.

Abstract

Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.

AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers

TL;DR

Diffusion Transformers incur high inference costs due to dense per-layer computations across timesteps. AdaCorrection introduces an adaptive, training-free framework that detects spatio-temporal drift in cached activations via an Offset Estimation Module (OEM) and corrects it with an Adaptive Correction Module (ACM) by interpolating between cached and fresh activations using a per-layer weight . The method provides theoretical bounds on correction error, analyzes computational complexity, and demonstrates strong, consistent gains across multiple backbones and datasets while preserving throughput. Overall, AdaCorrection shifts the quality-speed Pareto frontier toward higher fidelity with minimal overhead, enabling practical, plug-and-play acceleration for diffusion-based image and video generation.

Abstract

Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.
Paper Structure (23 sections, 2 theorems, 8 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 8 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Assume each Transformer block $\mathrm{Block}_\ell$ is $L$-Lipschitz and the cached input is reused with lag $\tau \ge 0$. Under the adaptive interpolation Eq. (7), the instantaneous deviation is bounded by $\|h_t^{\ell+1}-\hat{h}_t^{\ell+1}\|_2 \le (1-\lambda_t^{\ell})\,L\,\tau\,S_t^{\ell}$.

Figures (5)

  • Figure 1: Cache misalignment (top) and AdaCorrection solution (bottom).
  • Figure 2: Spatial variation heatmap (darker = higher variation, cf. Eq. (2)).
  • Figure 3: Quality-Speed Trade-off Analysis. AdaCorrection consistently improves generation quality (lower FID) while maintaining competitive speedup across different caching methods. Arrows indicate improvements from baseline methods (circles) to AdaCorrection-enhanced versions (squares). The method shifts the Pareto frontier toward better quality without sacrificing efficiency, achieving near-original FID scores while providing substantial acceleration.
  • Figure 4: Parameter Sensitivity Analysis. Impact of $\gamma$ and $\lambda$ on FID, FPS, and hit rate. $\gamma=1.0$ and $\lambda=1.0$ provide optimal balance between quality and efficiency.
  • Figure 5: Layer-wise Analysis: Offset score distribution, temporal drift heatmap, and cache hit rate per layer.

Theorems & Definitions (2)

  • Proposition 1: Bounded Error Propagation
  • Theorem 1: Convergence