Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Hancheng Ye, Jiakang Yuan, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang
TL;DR
This work tackles the latency of diffusion model denoising by introducing AdaptiveDiffusion, a training-free, prompt-aware acceleration method that adaptively reduces noise-prediction steps using a third-order latent-difference criterion. The core idea is to skip noisy updates when the third-order difference $\Delta^{(3)}x$ indicates stability, governed by the rule $\xi(x_{i-1})=\|\Delta^{(3)}x_{i-1}\| \ge \delta\|\Delta x_i\|$, with a safeguard $C_{max}$ to limit accumulation error. The approach yields substantial speedups across image and video generation (average $2\times$ to $5\times$, up to $5.6\times$ on some tasks) while preserving output quality, outperforming fixed-acceleration baselines like DeepCache and Adaptive DPM-Solver. This enables near real-time diffusion generation and broad applicability across models, schedulers, and modalities with minimal deployment burden.
Abstract
Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2~5x speedup without quality degradation.
