InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
Zihao Wu
TL;DR
InvarDiff introduces a training-free cross-scale caching method that exploits cross-timestep and cross-layer invariances in deterministic diffusion-model sampling to accelerate inference. It derives a binary cache plan and a step-level gate via a two-phase calibration with resampling correction, and applies a step-first then layer-wise scheduling to reuse computations without retraining or architectural changes. The approach yields substantial end-to-end speedups (up to around 3×) on DiT-family backbones like FLUX and DiT-XL/2 while preserving perceptual quality, and adapts to DiT-style variants. The results demonstrate robust, transferable acceleration that complements existing speedup techniques and can be extended to high-resolution image and video generation pipelines.
Abstract
Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX, our approach reduces redundant compute while preserving fidelity. Experiments show that InvarDiff achieves $2$-$3\times$ end-to-end speed-ups with minimal impact on standard quality metrics. Qualitatively, we observe almost no degradation in visual quality compared with full computations.
