Table of Contents
Fetching ...

Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache

Bowen Cui, Yuanbin Wang, Huajiang Xu, Biaolong Chen, Aixi Zhang, Hao Jiang, Zhengzheng Jin, Xu Liu, Pipei Huang

TL;DR

DPCache is a novel training-free acceleration framework that formulates diffusion sampling acceleration as a global path planning problem, and employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity.

Abstract

Diffusion models have demonstrated remarkable success in image and video generation, yet their practical deployment remains hindered by the substantial computational overhead of multi-step iterative sampling. Among acceleration strategies, caching-based methods offer a training-free and effective solution by reusing or predicting features across timesteps. However, existing approaches rely on fixed or locally adaptive schedules without considering the global structure of the denoising trajectory, often leading to error accumulation and visual artifacts. To overcome this limitation, we propose DPCache, a novel training-free acceleration framework that formulates diffusion sampling acceleration as a global path planning problem. DPCache constructs a Path-Aware Cost Tensor from a small calibration set to quantify the path-dependent error of skipping timesteps conditioned on the preceding key timestep. Leveraging this tensor, DPCache employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity. During inference, the model performs full computations only at these key timesteps, while intermediate outputs are efficiently predicted using cached features. Extensive experiments on DiT, FLUX, and HunyuanVideo demonstrate that DPCache achieves strong acceleration with minimal quality loss, outperforming prior acceleration methods by $+$0.031 ImageReward at 4.87$\times$ speedup and even surpassing the full-step baseline by $+$0.028 ImageReward at 3.54$\times$ speedup on FLUX, validating the effectiveness of our path-aware global scheduling framework. Code will be released at https://github.com/argsss/DPCache.

Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache

TL;DR

DPCache is a novel training-free acceleration framework that formulates diffusion sampling acceleration as a global path planning problem, and employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity.

Abstract

Diffusion models have demonstrated remarkable success in image and video generation, yet their practical deployment remains hindered by the substantial computational overhead of multi-step iterative sampling. Among acceleration strategies, caching-based methods offer a training-free and effective solution by reusing or predicting features across timesteps. However, existing approaches rely on fixed or locally adaptive schedules without considering the global structure of the denoising trajectory, often leading to error accumulation and visual artifacts. To overcome this limitation, we propose DPCache, a novel training-free acceleration framework that formulates diffusion sampling acceleration as a global path planning problem. DPCache constructs a Path-Aware Cost Tensor from a small calibration set to quantify the path-dependent error of skipping timesteps conditioned on the preceding key timestep. Leveraging this tensor, DPCache employs dynamic programming to select an optimal sequence of key timesteps that minimizes the total path cost while preserving trajectory fidelity. During inference, the model performs full computations only at these key timesteps, while intermediate outputs are efficiently predicted using cached features. Extensive experiments on DiT, FLUX, and HunyuanVideo demonstrate that DPCache achieves strong acceleration with minimal quality loss, outperforming prior acceleration methods by 0.031 ImageReward at 4.87 speedup and even surpassing the full-step baseline by 0.028 ImageReward at 3.54 speedup on FLUX, validating the effectiveness of our path-aware global scheduling framework. Code will be released at https://github.com/argsss/DPCache.
Paper Structure (24 sections, 8 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 8 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) Fixed schedule is inflexible and unable to identify critical timesteps, resulting in large deviation from the true trajectory. (b) Locally adaptive schedule makes greedy, short-sighted decisions that often skip essential timesteps, leading to irreversible deviation. (c) DPCache identifies a globally optimal sequence of key timesteps through calibration and achieves low cumulative trajectory deviation.
  • Figure 2: Overview of DPCache. (a) During the calibration stage, the full $T$-step denoising process is executed to construct a 3D Path-Aware Cost Tensor (PACT), which quantifies the cumulative error of skipping intermediate timesteps conditioned on the preceding key step. (b) An optimal $K$-step sampling schedule ($K < T$) is selected via dynamic programming, leveraging the PACT to maintain a DP table to store minimal cumulative costs and a path table to enable backtracking. (c) During inference, the model computes and caches features only at the preselected key timesteps, while features at other timesteps are efficiently predicted.
  • Figure 3: Qualitative comparison of different acceleration methods applied to FLUX.1-dev on DrawBench dataset.
  • Figure 4: Qualitative comparison of different acceleration methods applied to HunyuanVideo on VBench dataset.
  • Figure 5: Visualization of sampling trajectories when applying different acceleration methods to FLUX.1-dev.
  • ...and 7 more figures