Table of Contents
Fetching ...

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Yasaman Haghighi, Alexandre Alahi

TL;DR

This work formalizes the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, and proposes Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.

Abstract

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

TL;DR

This work formalizes the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, and proposes Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.

Abstract

Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.
Paper Structure (31 sections, 12 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 12 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: SenCache is a caching algorithm for accelerating the inference of diffusion models. Unlike prior methods that rely on heuristics, SenCache uses a theoretically motivated measure of network sensitivity to its input perturbations as the criterion for caching. All examples are generated with Wan 2.1 wan2025open. Under the same compute budget, SenCache better preserves the visual quality of the generated samples.
  • Figure 2: SenCache uses sensitivity as a caching criterion. At each denoising step, if the changes in the noisy latent $x_t$ and the sampling step $t$ are sufficiently small such that the sensitivity score (see \ref{['eq:sensitivity-score']}) falls below $\varepsilon$, we reuse the cached denoiser output; otherwise, we refresh the cache at the current state. By skipping expensive denoiser evaluations when the output is expected to change minimally, SenCache accelerates diffusion-model inference.
  • Figure 3: Sensitivity analysis of SiT-XL/2. (a) We analyze the network's output sensitivity by computing the norm of the Jacobian with respect to the noisy latent (Left) and the timestep (Right). We observe that both inputs are significant for estimating changes in the network output. Furthermore, we find that this norm can be accurately approximated with a small number of samples; our comparison shows that 16 samples provide an estimate comparable to that from 2048 or 8192 samples, indicating that large batch sizes are not required for this estimation. (b) Leveraging this sensitivity score, we select an optimized subset of 25 denoising steps from a 250-step SDE sampler, compared against a baseline of uniform step selection. The sensitivity-guided method strategically skips steps where the network output exhibits low sensitivity (i.e., is not changing much), allowing for effective caching without harming output quality. The visual results demonstrate that samples generated with our method suffer minimal degradation, whereas the uniform selection baseline results in significant visual degradation.
  • Figure 4: Effect of calibration set size on sensitivity estimation. We compare sensitivity profiles estimated from 8 videos versus 4096 videos and find that 8 diverse videos already yield a close match, indicating that large calibration sets are not required.
  • Figure 5: MAE between the denoiser outputs at two consecutive timesteps.