Table of Contents
Fetching ...

PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores

Guangyi Wang, Yuren Cai, Lijiang Li, Wei Peng, Songzhi Su

TL;DR

PFDiff introduces a training-free, orthogonal approach to accelerate diffusion model sampling by combining past- and future-score guidance for timestep skipping. It replaces current score computations with a past 'springboard' and forecasts a future score to correct discretization errors in first-order solvers, enabling efficient $k$-step skips with minimal NFEs. The method is theoretically grounded through trajectory-based analysis and convergence arguments and is validated across unconditional and conditional pre-trained DPMs, showing substantial FID improvements at low NFEs while maintaining comparable inference time. This approach broadens the practicality of fast diffusion sampling, particularly for large pre-trained, conditionally guided models.

Abstract

Diffusion Probabilistic Models (DPMs) have shown remarkable potential in image generation, but their sampling efficiency is hindered by the need for numerous denoising steps. Most existing solutions accelerate the sampling process by proposing fast ODE solvers. However, the inevitable discretization errors of the ODE solvers are significantly magnified when the number of function evaluations (NFE) is fewer. In this work, we propose PFDiff, a novel training-free and orthogonal timestep-skipping strategy, which enables existing fast ODE solvers to operate with fewer NFE. Specifically, PFDiff initially utilizes score replacement from past time steps to predict a ``springboard". Subsequently, it employs this ``springboard" along with foresight updates inspired by Nesterov momentum to rapidly update current intermediate states. This approach effectively reduces unnecessary NFE while correcting for discretization errors inherent in first-order ODE solvers. Experimental results demonstrate that PFDiff exhibits flexible applicability across various pre-trained DPMs, particularly excelling in conditional DPMs and surpassing previous state-of-the-art training-free methods. For instance, using DDIM as a baseline, we achieved 16.46 FID (4 NFE) compared to 138.81 FID with DDIM on ImageNet 64x64 with classifier guidance, and 13.06 FID (10 NFE) on Stable Diffusion with 7.5 guidance scale. Code is available at \url{https://github.com/onefly123/PFDiff}.

PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores

TL;DR

PFDiff introduces a training-free, orthogonal approach to accelerate diffusion model sampling by combining past- and future-score guidance for timestep skipping. It replaces current score computations with a past 'springboard' and forecasts a future score to correct discretization errors in first-order solvers, enabling efficient -step skips with minimal NFEs. The method is theoretically grounded through trajectory-based analysis and convergence arguments and is validated across unconditional and conditional pre-trained DPMs, showing substantial FID improvements at low NFEs while maintaining comparable inference time. This approach broadens the practicality of fast diffusion sampling, particularly for large pre-trained, conditionally guided models.

Abstract

Diffusion Probabilistic Models (DPMs) have shown remarkable potential in image generation, but their sampling efficiency is hindered by the need for numerous denoising steps. Most existing solutions accelerate the sampling process by proposing fast ODE solvers. However, the inevitable discretization errors of the ODE solvers are significantly magnified when the number of function evaluations (NFE) is fewer. In this work, we propose PFDiff, a novel training-free and orthogonal timestep-skipping strategy, which enables existing fast ODE solvers to operate with fewer NFE. Specifically, PFDiff initially utilizes score replacement from past time steps to predict a ``springboard". Subsequently, it employs this ``springboard" along with foresight updates inspired by Nesterov momentum to rapidly update current intermediate states. This approach effectively reduces unnecessary NFE while correcting for discretization errors inherent in first-order ODE solvers. Experimental results demonstrate that PFDiff exhibits flexible applicability across various pre-trained DPMs, particularly excelling in conditional DPMs and surpassing previous state-of-the-art training-free methods. For instance, using DDIM as a baseline, we achieved 16.46 FID (4 NFE) compared to 138.81 FID with DDIM on ImageNet 64x64 with classifier guidance, and 13.06 FID (10 NFE) on Stable Diffusion with 7.5 guidance scale. Code is available at \url{https://github.com/onefly123/PFDiff}.
Paper Structure (37 sections, 1 theorem, 26 equations, 13 figures, 11 tables, 3 algorithms)

This paper contains 37 sections, 1 theorem, 26 equations, 13 figures, 11 tables, 3 algorithms.

Key Result

Proposition 3.1

For any given DPM first-order ODE solver, the absolute values of the coefficients for higher-order derivative terms in Eq. (eq:15) are smaller when using the future time point $r=\varepsilon$ score compared to the current time point $r=t_{i-1}$ score, as follows (Proof in Appendix ProofProp1):

Figures (13)

  • Figure 1: (a) The trend of the MSE of the noise network output $\epsilon_\theta(x_t,t)$ over time step size $\Delta t$, where $\eta$ comes from $\bar{\sigma}_{t}$ in Eq. (\ref{['eq:6']}). Solid lines: ODE solvers, dashed lines: SDE solvers. (b) MSE of the status separately updated using "springboard" $\tilde{x}_{t_{i+h}}$ and future score $\epsilon_\theta (\tilde{x}_{t_{i+h}}, t_{i+h})$, relative to the sampling process with 1000 NFE, is given by: $\| \tilde{x}_{t_{i+(k+1)}}- \tilde{x}^{gt}_{t_{i+(k+1)}} \| ^2$. (c) Comparison of partial sampling trajectories between PFDiff-1 and a first-order ODE solver, where the update directions are guided by the tangent direction of the sampling trajectories.
  • Figure 2: Illustration of a single iteration update of PFDiff-$k$_$h$ combined with any first-order ODE solver $\phi$. Given specific values of $k$ and $h$ ($k \le 3$ ($h \le k$)), PFDiff first uses the past score $Q$ stored in the Buffer from the previous iteration to replace the current score, updating to the "springboard" $x_{t_{i+h}}$; then the future score is calculated using the "springboard"; finally, the future score is used to replace the current score, completing a full update iteration. The future score will also be passed to the next iteration as the "past" score for the next round of updates.
  • Figure 3: Unconditional sampling results. We report the FID$\downarrow$ for different methods by varying the number of function evaluations (NFE), evaluated on 50k samples.
  • Figure 4: Conditional sampling results. We report the FID$\downarrow$ for different methods by varying the NFE. Evaluated: ImageNet 64x64 with 50k, others with 10k samples. ${}^{\ast}$AutoDiffusion AutoDiffusion requires additional search costs. ${}^{\dagger}$We borrow the results reported in DPM-Solver-v3 DPM-Solver-v3 directly.
  • Figure 5: The trend (mean and standard deviation (std)) of accumulated truncation error over time step $t$ on the CIFAR10 CIFAR dataset, relative to DDIM DDIM with 1000 NFE, varying the number of function evaluations (NFE) $\in \{6,10,20\}$.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Proposition 3.1