Table of Contents
Fetching ...

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong Ma, Longhui Wei, Feng Wang, Shiliang Zhang, Qi Tian

TL;DR

MagCache introduces a magnitude-aware cache for fast video diffusion by exploiting a universal law: the per-step magnitude ratio ${\gamma_t}$ describing residual changes is mostly stable early and decreases monotonically, enabling adaptive timestep skipping with bounded error. A two-part system combines accurate error modeling ${\varepsilon_{\mathrm{skip}}(\hat{t}, t) \approx 1 - \prod_{i=\hat{t}+1}^t {\gamma_i}}$ with an adaptive caching policy that reuses or recomputes residuals to keep total error ${\mathcal{E}_t}$ within a user-defined threshold ${\delta}$ and skip length within ${K}$. Empirically, MagCache delivers 2.10x-2.68x speedups across models like Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo while improving LPIPS, SSIM, and PSNR under similar compute, and requires only a single calibration sample, outperforming prior methods such as TeaCache in robustness and efficiency. The approach is plug-and-play, memory-efficient, and compatible with other acceleration techniques, suggesting significant practical impact for real-time or resource-constrained video generation. Limitations include validation primarily on video diffusion models; future work will extend to more tasks and models and provide broader release of code and resources.

Abstract

Existing acceleration techniques for video diffusion models often rely on uniform heuristics or time-embedding variants to skip timesteps and reuse cached features. These approaches typically require extensive calibration with curated prompts and risk inconsistent outputs due to prompt-specific overfitting. In this paper, we introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts. Specifically, the magnitude ratio of successive residual outputs decreases monotonically, steadily in most timesteps while rapidly in the last several steps. Leveraging this insight, we introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy. Unlike existing methods requiring dozens of curated samples for calibration, MagCache only requires a single sample for calibration. Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity. It significantly outperforms existing methods in LPIPS, SSIM, and PSNR, under similar computational budgets.

MagCache: Fast Video Generation with Magnitude-Aware Cache

TL;DR

MagCache introduces a magnitude-aware cache for fast video diffusion by exploiting a universal law: the per-step magnitude ratio describing residual changes is mostly stable early and decreases monotonically, enabling adaptive timestep skipping with bounded error. A two-part system combines accurate error modeling with an adaptive caching policy that reuses or recomputes residuals to keep total error within a user-defined threshold and skip length within . Empirically, MagCache delivers 2.10x-2.68x speedups across models like Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo while improving LPIPS, SSIM, and PSNR under similar compute, and requires only a single calibration sample, outperforming prior methods such as TeaCache in robustness and efficiency. The approach is plug-and-play, memory-efficient, and compatible with other acceleration techniques, suggesting significant practical impact for real-time or resource-constrained video generation. Limitations include validation primarily on video diffusion models; future work will extend to more tasks and models and provide broader release of code and resources.

Abstract

Existing acceleration techniques for video diffusion models often rely on uniform heuristics or time-embedding variants to skip timesteps and reuse cached features. These approaches typically require extensive calibration with curated prompts and risk inconsistent outputs due to prompt-specific overfitting. In this paper, we introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts. Specifically, the magnitude ratio of successive residual outputs decreases monotonically, steadily in most timesteps while rapidly in the last several steps. Leveraging this insight, we introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy. Unlike existing methods requiring dozens of curated samples for calibration, MagCache only requires a single sample for calibration. Experimental results show that MagCache achieves 2.10x-2.68x speedups on Open-Sora, CogVideoX, Wan 2.1, and HunyuanVideo, while preserving superior visual fidelity. It significantly outperforms existing methods in LPIPS, SSIM, and PSNR, under similar computational budgets.

Paper Structure

This paper contains 26 sections, 20 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Relationships between residuals across diffusion timesteps. Differences between adjacent residuals are mainly due to magnitude rather than direction during the first 80% of steps. In the final 20%, both magnitude ratio and cosine distance change sharply in opposite trends, but the magnitude ratio still reflects residual differences. (a) Average magnitude ratio decreases gradually, then drops sharply near the end; ratios close to 1 indicate stable transitions suitable for cache reuse. (b) Standard deviation of the magnitude ratio remains near zero in early steps, indicating stable magnitudes. (c) Token-wise cosine distance stays near zero early on, showing consistent residual directions.
  • Figure 2: Overview of the MagCache. The MagCache consists of error modeling mechanism and adaptive caching strategy. With the estimated total accumulated error $\mathcal{E}$, MagCache adaptively reuses the old cache or computes a new cache by validating the two conditions in Sec \ref{['sec:method_cache']}.
  • Figure 3: Comparison of visual quality and efficiency (denoted by latency) with the competing method. MagCache outperforms TeaCache liu2024timestep in both visual quality and efficiency. Latency is evaluated on a single A800 GPU. Video generation specifications: Open-Sora Open-Sora (51 frames, 480p), Wan 2.1 1.3B wan2025 (81 frames , 480p). Best-viewed with zoom-in.
  • Figure 4: Average Magnitude Ratio between $\mathbf{r}_t$ and $\mathbf{r}_{\hat{t}}$, where $\hat{t} = t-3$. The $\Gamma(t, \hat{t})$ is the ground-truth magnitude ratio, while the $\prod_{i = \hat{t}+1}^t \gamma_i$ is the predicted magnitude ratio using the multiplicative formulation in Equation \ref{['eq: multiply']}.
  • Figure 5: Videos generated by Wan 2.1 1.3B using original model, Teacache-Fast, and our MagCache-Fast. Best-viewed with zoom-in.
  • ...and 4 more figures