DiCache: Let Diffusion Model Determine Its Own Cache
Jiazi Bu, Pengyang Ling, Yujie Zhou, Yibin Wang, Yuhang Zang, Dahua Lin, Jiaqi Wang
TL;DR
This work tackles the challenge of accelerating diffusion models via caching by addressing when to cache and how to reuse caches. It introduces DiCache, a training-free, runtime adaptive approach comprised of an Online Probe Profiling Scheme (using shallow-layer signals to estimate caching error per sample) and Dynamic Cache Trajectory Alignment (combining multi-step caches via probe-informed trajectories). The method achieves higher efficiency and fidelity than state-of-the-art baselines across WAN 2.1, HunyuanVideo, and Flux, and is compatible with sparse-attention acceleration like Sparse VideoGen. The results demonstrate robust per-sample caching decisions and improved reconstruction quality, highlighting practical impact for scalable diffusion-model deployment.
Abstract
Recent years have witnessed the rapid development of acceleration techniques for diffusion models, especially caching-based acceleration methods. These studies seek to answer two fundamental questions: "When to cache" and "How to use cache", typically relying on predefined empirical laws or dataset-level priors to determine caching timings and adopting handcrafted rules for multi-step cache utilization. However, given the highly dynamic nature of the diffusion process, they often exhibit limited generalizability and fail to cope with diverse samples. In this paper, a strong sample-specific correlation is revealed between the variation patterns of the shallow-layer feature differences in the diffusion model and those of deep-layer features. Moreover, we have observed that the features from different model layers form similar trajectories. Based on these observations, we present DiCache, a novel training-free adaptive caching strategy for accelerating diffusion models at runtime, answering both when and how to cache within a unified framework. Specifically, DiCache is composed of two principal components: (1) Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain an on-the-fly indicator for the caching error in real time, enabling the model to dynamically customize the caching schedule for each sample. (2) Dynamic Cache Trajectory Alignment adaptively approximates the deep-layer feature output from multi-step historical caches based on the shallow-layer feature trajectory, facilitating higher visual quality. Extensive experiments validate DiCache's capability in achieving higher efficiency and improved fidelity over state-of-the-art approaches on various leading diffusion models including WAN 2.1, HunyuanVideo and Flux.
