Table of Contents
Fetching ...

ETC: training-free diffusion models acceleration with Error-aware Trend Consistency

Jiajian Xie, Hubery Yin, Chen Li, Zhou Zhao, Shengyu Zhang

TL;DR

Diffusion models deliver high-quality generation but suffer from slow, multi-step sampling. The paper proposes ETC, a training-free diffusion acceleration framework that enforces trajectory consistency via a Consistent Trend Predictor and calibrates error tolerance with a model-specific threshold search. By aggregating historical denoise trends into future direction estimates and adaptively expanding the approximation window within model tolerance, ETC achieves a $2.65$ speedup over FLUX with negligible SSIM degradation of $0.074$. It demonstrates strong gains across image, video, and audio tasks, outperforming state-of-the-art training-free baselines in both speed and consistency. A noted limitation is the conservative cap on per-iteration steps, with future work aimed at dynamic estimation of the maximum feasible approximations.

Abstract

Diffusion models have achieved remarkable generative quality but remain bottlenecked by costly iterative sampling. Recent training-free methods accelerate diffusion process by reusing model outputs. However, these methods ignore denoising trends and lack error control for model-specific tolerance, leading to trajectory deviations under multi-step reuse and exacerbating inconsistencies in the generated results. To address these issues, we introduce Error-aware Trend Consistency (ETC), a framework that (1) introduces a consistent trend predictor that leverages the smooth continuity of diffusion trajectories, projecting historical denoising patterns into stable future directions and progressively distributing them across multiple approximation steps to achieve acceleration without deviating; (2) proposes a model-specific error tolerance search mechanism that derives corrective thresholds by identifying transition points from volatile semantic planning to stable quality refinement. Experiments show that ETC achieves a 2.65x acceleration over FLUX with negligible (-0.074 SSIM score) degradation of consistency.

ETC: training-free diffusion models acceleration with Error-aware Trend Consistency

TL;DR

Diffusion models deliver high-quality generation but suffer from slow, multi-step sampling. The paper proposes ETC, a training-free diffusion acceleration framework that enforces trajectory consistency via a Consistent Trend Predictor and calibrates error tolerance with a model-specific threshold search. By aggregating historical denoise trends into future direction estimates and adaptively expanding the approximation window within model tolerance, ETC achieves a speedup over FLUX with negligible SSIM degradation of . It demonstrates strong gains across image, video, and audio tasks, outperforming state-of-the-art training-free baselines in both speed and consistency. A noted limitation is the conservative cap on per-iteration steps, with future work aimed at dynamic estimation of the maximum feasible approximations.

Abstract

Diffusion models have achieved remarkable generative quality but remain bottlenecked by costly iterative sampling. Recent training-free methods accelerate diffusion process by reusing model outputs. However, these methods ignore denoising trends and lack error control for model-specific tolerance, leading to trajectory deviations under multi-step reuse and exacerbating inconsistencies in the generated results. To address these issues, we introduce Error-aware Trend Consistency (ETC), a framework that (1) introduces a consistent trend predictor that leverages the smooth continuity of diffusion trajectories, projecting historical denoising patterns into stable future directions and progressively distributing them across multiple approximation steps to achieve acceleration without deviating; (2) proposes a model-specific error tolerance search mechanism that derives corrective thresholds by identifying transition points from volatile semantic planning to stable quality refinement. Experiments show that ETC achieves a 2.65x acceleration over FLUX with negligible (-0.074 SSIM score) degradation of consistency.

Paper Structure

This paper contains 19 sections, 27 equations, 10 figures, 4 tables, 2 algorithms.

Figures (10)

  • Figure 1: Visualization of trajectory deviation and denoise error tolerance. Subfigure (a) shows that existing methods fail to follow the original denoising trajectory and reduce latent similarity. Subfigure (b) shows the model maintains consistent results to a certain degree of denoising errors.
  • Figure 2: An overview of ETC. ETC leverages all historical model outputs to estimate future trends and dynamically adjusts approximation frequency according to each model’s error tolerance limit.
  • Figure 3: The patterns of the denoising process observed during inference with FLUX on MSCOCO-2017 validation set. Subfigure (a) shows how the model output varies under different latent error. Subfigure (b) illustrates the similarity of current trend to each historical trends. Subfigure (c) depicts the stability of trend changes across different denoising stages.
  • Figure 4: Error accumulation at $\alpha=0.5$.
  • Figure 5: Comparison of visual quality with the competing method. Other methods exhibit issues such as text failure and missing details, whereas ETC achieves the best generation consistency.
  • ...and 5 more figures