Table of Contents
Fetching ...

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

Fanpu Cao, Yaofo Chen, Zeng You, Wei Luo, Cen Chen

TL;DR

The paper tackles the high computational cost of diffusion transformers by exploiting temporal redundancy in a training-free manner. It introduces ProCache, a two-part framework consisting of constraint-aware non-uniform caching pattern search and selective computation to refresh critical semantics, aligning caching with the non-uniform temporal dynamics of diffusion denoising. Through offline pattern search and lightweight updates, ProCache achieves up to 2.90x acceleration on DiT models with negligible quality degradation, outperforming prior caching-based methods across multiple datasets and tasks. This approach has strong practical implications for real-time diffusion-based generation and broad applicability to large transformer-based diffusion models.

Abstract

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by exploiting temporal redundancy, existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation. In this work, we analyze the evolution of DiT features during denoising and reveal that both feature changes and error propagation are highly time- and depth-varying. Motivated by this, we propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components: (i) a constraint-aware caching pattern search module that generates non-uniform activation schedules through offline constrained sampling, tailored to the model's temporal characteristics; and (ii) a selective computation module that selectively computes within deep blocks and high-importance tokens for cached segments to mitigate error accumulation with minimal overhead. Extensive experiments on PixArt-alpha and DiT demonstrate that ProCache achieves up to 1.96x and 2.90x acceleration with negligible quality degradation, significantly outperforming prior caching-based methods.

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

TL;DR

The paper tackles the high computational cost of diffusion transformers by exploiting temporal redundancy in a training-free manner. It introduces ProCache, a two-part framework consisting of constraint-aware non-uniform caching pattern search and selective computation to refresh critical semantics, aligning caching with the non-uniform temporal dynamics of diffusion denoising. Through offline pattern search and lightweight updates, ProCache achieves up to 2.90x acceleration on DiT models with negligible quality degradation, outperforming prior caching-based methods across multiple datasets and tasks. This approach has strong practical implications for real-time diffusion-based generation and broad applicability to large transformer-based diffusion models.

Abstract

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by exploiting temporal redundancy, existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation. In this work, we analyze the evolution of DiT features during denoising and reveal that both feature changes and error propagation are highly time- and depth-varying. Motivated by this, we propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components: (i) a constraint-aware caching pattern search module that generates non-uniform activation schedules through offline constrained sampling, tailored to the model's temporal characteristics; and (ii) a selective computation module that selectively computes within deep blocks and high-importance tokens for cached segments to mitigate error accumulation with minimal overhead. Extensive experiments on PixArt-alpha and DiT demonstrate that ProCache achieves up to 1.96x and 2.90x acceleration with negligible quality degradation, significantly outperforming prior caching-based methods.

Paper Structure

This paper contains 24 sections, 7 equations, 11 figures, 8 tables, 2 algorithms.

Figures (11)

  • Figure 1: Evolution of relative L1 error across diffusion steps in DiT blocks, which is computed from 10 samples on PixArt-$\alpha$. Errors grow progressively, with deeper blocks (e.g., Block 25–28) showing significantly higher magnitudes than shallower ones (e.g., Block 1–4), highlighting the non-uniform error accumulation across network depths.
  • Figure 2: Overall pipeline of ProCache. 1) ProCache explores valid caching patterns under three principled constraints and selects the optimal strategy via lightweight offline sampling based on quality metrics (e.g., FID). 2) It then inserts partial recomputation within contiguous cache steps, selectively updating high-importance tokens in deeper layers.
  • Figure 3: Output L1 error between the current step and the previous step in DiT-XL/2 across the diffusion process.
  • Figure 4: Relative L1 error of features across different blocks in DiT-XL/2, measured at step 20.
  • Figure 5: Image generation samples at 1024 $\times$ 1024 resolutions under 1.56$\times$ speed-up ratios.
  • ...and 6 more figures