Table of Contents
Fetching ...

FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

Jiacheng Liu, Peiliang Cai, Qinming Zhou, Yuqi Lin, Deyang Kong, Benhao Huang, Yupei Pan, Haowen Xu, Chang Zou, Junshu Tang, Shikang Zheng, Linfeng Zhang

TL;DR

FreqCa addresses the computational bottleneck of diffusion transformers by applying frequency-aware caching. It decouples features into low-frequency and high-frequency components, directly reusing the former while predicting the latter with a second-order Hermite predictor, and caches a single Cumulative Residual Feature (CRF) to achieve $ ext{O}(1)$ memory. The approach yields $6$–$7\times$ acceleration with less than $2\%$ quality degradation across multiple diffusion models and tasks, demonstrating practical, scalable inference speedups on consumer hardware. These findings establish a new state-of-the-art in efficient diffusion inference by unifying reuse and forecast paradigms through frequency decomposition and CRF caching.

Abstract

The application of diffusion transformers is suffering from their significant inference costs. Recently, feature caching has been proposed to solve this problem by reusing features from previous timesteps, thereby skipping computation in future timesteps. However, previous feature caching assumes that features in adjacent timesteps are similar or continuous, which does not always hold in all settings. To investigate this, this paper begins with an analysis from the frequency domain, which reveal that different frequency bands in the features of diffusion models exhibit different dynamics across timesteps. Concretely, low-frequency components, which decide the structure of images, exhibit higher similarity but poor continuity. In contrast, the high-frequency bands, which decode the details of images, show significant continuity but poor similarity. These interesting observations motivate us to propose Frequency-aware Caching (FreqCa) which directly reuses features of low-frequency components based on their similarity, while using a second-order Hermite interpolator to predict the volatile high-frequency ones based on its continuity. Besides, we further propose to cache Cumulative Residual Feature (CRF) instead of the features in all the layers, which reduces the memory footprint of feature caching by 99%. Extensive experiments on FLUX.1-dev, FLUX.1-Kontext-dev, Qwen-Image, and Qwen-Image-Edit demonstrate its effectiveness in both generation and editing. Codes are available in the supplementary materials and will be released on GitHub.

FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching

TL;DR

FreqCa addresses the computational bottleneck of diffusion transformers by applying frequency-aware caching. It decouples features into low-frequency and high-frequency components, directly reusing the former while predicting the latter with a second-order Hermite predictor, and caches a single Cumulative Residual Feature (CRF) to achieve memory. The approach yields acceleration with less than quality degradation across multiple diffusion models and tasks, demonstrating practical, scalable inference speedups on consumer hardware. These findings establish a new state-of-the-art in efficient diffusion inference by unifying reuse and forecast paradigms through frequency decomposition and CRF caching.

Abstract

The application of diffusion transformers is suffering from their significant inference costs. Recently, feature caching has been proposed to solve this problem by reusing features from previous timesteps, thereby skipping computation in future timesteps. However, previous feature caching assumes that features in adjacent timesteps are similar or continuous, which does not always hold in all settings. To investigate this, this paper begins with an analysis from the frequency domain, which reveal that different frequency bands in the features of diffusion models exhibit different dynamics across timesteps. Concretely, low-frequency components, which decide the structure of images, exhibit higher similarity but poor continuity. In contrast, the high-frequency bands, which decode the details of images, show significant continuity but poor similarity. These interesting observations motivate us to propose Frequency-aware Caching (FreqCa) which directly reuses features of low-frequency components based on their similarity, while using a second-order Hermite interpolator to predict the volatile high-frequency ones based on its continuity. Besides, we further propose to cache Cumulative Residual Feature (CRF) instead of the features in all the layers, which reduces the memory footprint of feature caching by 99%. Extensive experiments on FLUX.1-dev, FLUX.1-Kontext-dev, Qwen-Image, and Qwen-Image-Edit demonstrate its effectiveness in both generation and editing. Codes are available in the supplementary materials and will be released on GitHub.

Paper Structure

This paper contains 36 sections, 3 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Images sampled by Qwen-image with FreqCa with 7.14$\times$ acceleration.
  • Figure 2: Analysis from the frequency perspective.(a)-(b):Temporal similarity analysis using cosine similarity for low-frequency and high-frequency components across different step intervals. (c)-(d): Feature trajectory visualized via Principal Component Analysis (PCA).
  • Figure 3: Overview of the FreqCa framework. (a) CRF Caching : Instead of caching features at every layer, we cache only the single Cumulative Residual Feature (CRF) at the end. (b) Frequency-aware Caching: The cached features are separated into low- and high-frequency bands using frequency decomposition techniques such as FFT or DCT. (c) Low-Frequency Strategy: Low-frequency component is directly reused from the prior step. (d) High-Frequency Strategy: High-frequency component is forecasted using a Hermite predictor fitted on the last two activated steps.
  • Figure 4: Box plots of Mean Squared Error (MSE) between ground-truth and predicted features per timestep. (a) layer-wise feature caching and (b) cumulative residual feature (CRF) caching.
  • Figure 5: Gedit Benchmark on Qwen-Image-Edit, FreqCa outperforms most baselines.
  • ...and 6 more figures