Table of Contents
Fetching ...

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-Jin Liu

TL;DR

This work tackles the slow inference of diffusion models by introducing TimeTuner, a plug-in that optimizes the integral direction of diffusion sampling through per-interval timestep selection $tau$, steering the sampling distribution toward the real distribution and mitigating truncation error. It provides a theoretical foundation via an upper bound on sampling error for deterministic solvers using a generalized solver $f_{theta,tau}$ and presents a practical training objective $L_i(tau_i)$ to learn better timesteps, implemented in sequential or parallel modes. Empirically, TimeTuner consistently enhances the performance of diverse, training-free acceleration methods across multiple datasets and NFEs, with the largest gains at low NFEs and compatibility with high-resolution latent diffusion models. The approach offers a light-weight, plug-in improvement that can be adopted broadly to accelerate diffusion-based generation without retraining the original models, improving both efficiency and sample quality in real-world applications.

Abstract

A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integral process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a \textbf{timestep tuner} that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, enforcing the sampling distribution towards the real one. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code is available at \href{https://github.com/THU-LYJ-Lab/time-tuner}{https://github.com/THU-LYJ-Lab/time-tuner}.

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

TL;DR

This work tackles the slow inference of diffusion models by introducing TimeTuner, a plug-in that optimizes the integral direction of diffusion sampling through per-interval timestep selection , steering the sampling distribution toward the real distribution and mitigating truncation error. It provides a theoretical foundation via an upper bound on sampling error for deterministic solvers using a generalized solver and presents a practical training objective to learn better timesteps, implemented in sequential or parallel modes. Empirically, TimeTuner consistently enhances the performance of diverse, training-free acceleration methods across multiple datasets and NFEs, with the largest gains at low NFEs and compatibility with high-resolution latent diffusion models. The approach offers a light-weight, plug-in improvement that can be adopted broadly to accelerate diffusion-based generation without retraining the original models, improving both efficiency and sample quality in real-world applications.

Abstract

A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integral process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a \textbf{timestep tuner} that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, enforcing the sampling distribution towards the real one. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code is available at \href{https://github.com/THU-LYJ-Lab/time-tuner}{https://github.com/THU-LYJ-Lab/time-tuner}.
Paper Structure (13 sections, 2 theorems, 10 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 2 theorems, 10 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Assume that $\boldsymbol\epsilon_\theta$ is the ground-truth noise prediction model, with $\|\boldsymbol\epsilon_\theta(\mathbf x,t)-\boldsymbol\epsilon_\theta(\mathbf y,t)\|_2\geqslant\frac{1}{C}\|\mathbf x-\mathbf y\|_2$ for any $t$ and some $C>0$. Denote by $\mathbf x_{t_i}^{gt}$ the ground-truth

Figures (7)

  • Figure 1: Conceptual description of (a) the one-step truncation error, (b) the accumulative truncation error, and (c) enforcing the sampling distribution towards the real distribution of our TimeTuner by replacing the input timestep from $t$ to $\tau$, by the full-step reverse process (gray dashed line), the baseline acceleration pipeline (red line), and our proposed method with timestep tuner (green line).
  • Figure 2: Quantitative measurement of the gap between real and sampling distribution using DDIM and DPM-Solver-2. The horizontal axis represents timesteps forming (a) quadratic trajectory with NFE $=10$; (b) quadratic trajectory with NFE $=20$; (c) uniform trajectory with NFE $=10$; (d) log-SNR trajectory with NFE $=10$. We plot the $L_2$ distance between $(\mathbf x_t,\widetilde{\mathbf x}_t)$ for the original and the timestep-tuned sampler, shown in red and blue, respectively. We also provide an error bound for deterministic sampler theoretically in \ref{['thm:main2']}.
  • Figure 3: Quantitative comparison measured by $\log$ FID $\downarrow$ on CIFAR10 and CelebA, under original DDPM. All are evaluated with different NFEs on the horizontal axis. We apply quadratic trajectory for DDIM and DDPM, uniform trajectory for Analytic-DDIM and Analytic-DDPM, log-SNR trajectory for DPM-Solver-2.
  • Figure 4: Quantitative comparison measured by $\log$ FID $\downarrow$ on LSUN Bedroom, FFHQ, and CelebA-HQ, under LDM. All are evaluated with different NFEs on the horizontal axis. We apply uniform trajectory for DDIM and DDPM, and log-SNR trajectory for DPM-Solver-2.
  • Figure 5: Quantitative comparison measured by $\log$ FID $\downarrow$ on CIFAR10, ImageNet, and MS-COCO, under EDM and LDM. All are evaluated with different NFEs on the horizontal axis. We apply the originally designed trajectory for EDM and linear trajectory for LDM.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2