Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-Jin Liu
TL;DR
This work tackles the slow inference of diffusion models by introducing TimeTuner, a plug-in that optimizes the integral direction of diffusion sampling through per-interval timestep selection $tau$, steering the sampling distribution toward the real distribution and mitigating truncation error. It provides a theoretical foundation via an upper bound on sampling error for deterministic solvers using a generalized solver $f_{theta,tau}$ and presents a practical training objective $L_i(tau_i)$ to learn better timesteps, implemented in sequential or parallel modes. Empirically, TimeTuner consistently enhances the performance of diverse, training-free acceleration methods across multiple datasets and NFEs, with the largest gains at low NFEs and compatibility with high-resolution latent diffusion models. The approach offers a light-weight, plug-in improvement that can be adopted broadly to accelerate diffusion-based generation without retraining the original models, improving both efficiency and sample quality in real-world applications.
Abstract
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integral process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a \textbf{timestep tuner} that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, enforcing the sampling distribution towards the real one. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code is available at \href{https://github.com/THU-LYJ-Lab/time-tuner}{https://github.com/THU-LYJ-Lab/time-tuner}.
