MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang, Shiliang Zhang, Qi Tian
TL;DR
MagCache introduces a magnitude-aware cache for fast video diffusion by exploiting a universal law: the per-step magnitude ratio ${\gamma_t}$ describing residual changes is mostly stable early and decreases monotonically, enabling adaptive timestep skipping with bounded error. A two-part system combines accurate error modeling ${\varepsilon_{\mathrm{skip}}(\hat{t}, t) \approx 1 - \prod_{i=\hat{t}+1}^t {\gamma_i}}$ with an adaptive caching policy that reuses or recomputes residuals to keep total error ${\mathcal{E}_t}$ within a user-defined threshold ${\delta}$ and skip length within ${K}$. Empirically, MagCache delivers 2.10x-2.68x speedups across models like Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo while improving LPIPS, SSIM, and PSNR under similar compute, and requires only a single calibration sample, outperforming prior methods such as TeaCache in robustness and efficiency. The approach is plug-and-play, memory-efficient, and compatible with other acceleration techniques, suggesting significant practical impact for real-time or resource-constrained video generation. Limitations include validation primarily on video diffusion models; future work will extend to more tasks and models and provide broader release of code and resources.
Abstract
Existing acceleration techniques for video diffusion models often rely on uniform heuristics or time-embedding variants to skip timesteps and reuse cached features. These approaches typically require extensive calibration with curated prompts and risk inconsistent outputs due to prompt-specific overfitting. In this paper, we introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts. Specifically, the magnitude ratio of successive residual outputs decreases monotonically, steadily in most timesteps while rapidly in the last several steps. Leveraging this insight, we introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy. Unlike existing methods requiring dozens of curated samples for calibration, MagCache only requires a single sample for calibration. Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity. It significantly outperforms existing methods in LPIPS, SSIM, and PSNR, under similar computational budgets.
