Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

Jiaqi Han; Juntong Shi; Puheng Li; Haotian Ye; Qiushan Guo; Stefano Ermon

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

Jiaqi Han, Juntong Shi, Puheng Li, Haotian Ye, Qiushan Guo, Stefano Ermon

TL;DR

This work proposes spectral diffusion feature forecaster (Spectrum), a training-free approach that enables global, long-range feature reuse with tightly controlled error and theoretically reveal that this approach admits more favorable long-horizon behavior and yields an error bound that does not compound with the step size.

Abstract

Diffusion models have become the dominant tool for high-fidelity image and video generation, yet are critically bottlenecked by their inference speed due to the numerous iterative passes of Diffusion Transformers. To reduce the exhaustive compute, recent works resort to the feature caching and reusing scheme that skips network evaluations at selected diffusion steps by using cached features in previous steps. However, their preliminary design solely relies on local approximation, causing errors to grow rapidly with large skips and leading to degraded sample quality at high speedups. In this work, we propose spectral diffusion feature forecaster (Spectrum), a training-free approach that enables global, long-range feature reuse with tightly controlled error. In particular, we view the latent features of the denoiser as functions over time and approximate them with Chebyshev polynomials. Specifically, we fit the coefficient for each basis via ridge regression, which is then leveraged to forecast features at multiple future diffusion steps. We theoretically reveal that our approach admits more favorable long-horizon behavior and yields an error bound that does not compound with the step size. Extensive experiments on various state-of-the-art image and video diffusion models consistently verify the superiority of our approach. Notably, we achieve up to 4.79$\times$ speedup on FLUX.1 and 4.67$\times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

TL;DR

Abstract

speedup on FLUX.1 and 4.67

speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.

Paper Structure (23 sections, 3 theorems, 33 equations, 11 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 3 theorems, 33 equations, 11 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Method
Feature Forecasting for Diffusion Acceleration
From Local to Global: Bounding Forecasting Errors with Chebyshev Polynomials
Spectral Diffusion Feature Forecasting
Analysis and Extensions
Experiments
Accelerating Text-to-Image Diffusion Models
Accelerating Text-to-Video Diffusion Models
Ablation Studies
Conclusion
Proofs
Proof of Theorem \ref{['thm:taylor-minimax-lb']}
Proof of Theorem \ref{['theo:universal']}
...and 8 more sections

Key Result

Theorem 3.1

Fix an expansion point $\tau_k\in[0,1]$ and a target $\tau_j=\tau_k+ (j - k) \delta_t$. Consider the smoothness class Let $T_P[f](\tau_j)$ denote the ideal order-$P$ Taylor predictor of $f(\tau_j)$ centered at $\tau_k$ using the exact derivatives $f^{(p)}(\tau_k)$, $p\le P$. Then

Figures (11)

Figure 1: Qualitative comparison on text-to-image generation using FLUX.1. Spectrum aligns consistently with the 50-step reference while accelerating it by a factor of 4.79$\times$. Other baselines show noticeable degradation in color and prompt consistency.
Figure 2: Qualitative comparisons on HunyuanVideo. Spectrum achieves higher sample fidelity while delivering more speedup.
Figure 3: Qualitative comparison on text-to-video generation using Wan2.1-14B. Spectrum aligns consistently with the high-quality 50-step reference using only 14 network evaluations, while TaylorSeer is slower and exhibits noticeable artifacts on character and background.
Figure 4: Ablation study on the regularization weight $\lambda$.
Figure 5: Ablation on the degree of Chebyshev polynomials $M$.
...and 6 more figures

Theorems & Definitions (5)

Theorem 3.1: Worst-case error for local order-$P$ Taylor
Theorem 3.2: Universality of Chebyshev Polynomials
Theorem 3.3: Error Bound of Spectrum
proof
proof

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

TL;DR

Abstract

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (5)