Table of Contents
Fetching ...

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Hancheng Ye, Jiakang Yuan, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang

TL;DR

This work tackles the latency of diffusion model denoising by introducing AdaptiveDiffusion, a training-free, prompt-aware acceleration method that adaptively reduces noise-prediction steps using a third-order latent-difference criterion. The core idea is to skip noisy updates when the third-order difference $\Delta^{(3)}x$ indicates stability, governed by the rule $\xi(x_{i-1})=\|\Delta^{(3)}x_{i-1}\| \ge \delta\|\Delta x_i\|$, with a safeguard $C_{max}$ to limit accumulation error. The approach yields substantial speedups across image and video generation (average $2\times$ to $5\times$, up to $5.6\times$ on some tasks) while preserving output quality, outperforming fixed-acceleration baselines like DeepCache and Adaptive DPM-Solver. This enables near real-time diffusion generation and broad applicability across models, schedulers, and modalities with minimal deployment burden.

Abstract

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2~5x speedup without quality degradation.

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

TL;DR

This work tackles the latency of diffusion model denoising by introducing AdaptiveDiffusion, a training-free, prompt-aware acceleration method that adaptively reduces noise-prediction steps using a third-order latent-difference criterion. The core idea is to skip noisy updates when the third-order difference indicates stability, governed by the rule , with a safeguard to limit accumulation error. The approach yields substantial speedups across image and video generation (average to , up to on some tasks) while preserving output quality, outperforming fixed-acceleration baselines like DeepCache and Adaptive DPM-Solver. This enables near real-time diffusion generation and broad applicability across models, schedulers, and modalities with minimal deployment burden.

Abstract

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2~5x speedup without quality degradation.

Paper Structure

This paper contains 42 sections, 13 equations, 12 figures, 7 tables, 2 algorithms.

Figures (12)

  • Figure 1: Different prompts may have different denoising paths to generate the high-quality image. For Prompt 1, we only need 20 steps out of 50 steps for noise predictions to generate an almost lossless image, while for Prompt 2, we need 26 steps out of 50 steps to achieve an almost lossless image.
  • Figure 2: Denoising process of the proposed AdaptiveDiffusion: We design a third-order estimator (Refer to Sec. \ref{['sec:third_order']} for details), which can find the redundancy between neighboring timesteps, and thus, the noise prediction model can be skipped or inferred according to the indicate from the estimator, achieving the adaptive diffusion process. Note that the timestep and text information embeddings are not shown for the sake of brevity.
  • Figure 3: Different update strategies. (a) The default SDXL podell2023sdxl samples 50 steps of noise prediction followed by the latent update process. (b) Our AdaptiveDiffusion skips 25 steps of noise prediction according to the third-order estimator, while the latent is fully updated at all 50 steps. (c) SDXL samples 25 steps of the noise prediction and latent update process. (d) The default SDXL skips 25 steps of both noise prediction and latent update from its sampled 50 steps.
  • Figure 4: The relation between order differential distributions and the searched optimal skipping path for one prompt. (a) The 1st-order noise differential distribution of the original full-step generation shows no relation with the optimal skipping path. (b) The 1st latent differential distribution indicates the distribution of the optimal skipping path but with no explicit mapping with skipping decisions, while the relative 2nd-order latent differential distribution shows a certain skipping signal in its fluctuation, but this signal is buried in the unstable magnitude. (c) The relative 3rd-order latent differential distribution shows a clearer signal for skipping decisions.
  • Figure 5: The effectiveness of the proposed third-order estimator. (a) The third-order estimated skipping path shares a similar distribution with the optimal skipping path. (b) The latent error between the full-step update path and the estimated skipping path. (c) The $\chi^2$ stats and $p$-value between the greedy searched paths and the third-order estimated paths at different skipping targets.
  • ...and 7 more figures