Table of Contents
Fetching ...

Accelerating Controllable Generation via Hybrid-grained Cache

Lin Liu, Huixia Ben, Shuo Wang, Jinda Lu, Junxiang Qiu, Shengeng Tang, Yanbin Hao

TL;DR

This paper tackles the inefficiency of controllable diffusion-based generation by proposing Hybrid-grained Cache (HGC), a caching framework that combines block-level coarse caching with prompt-level fine caching to reduce computation without sacrificing control fidelity. The block-level cache accelerates structure-focused steps by caching at strategically chosen moments using feature similarity, while the prompt-level cache reuses cross-attention states and fuses prompt-guided and non-prompt computations at a gate step. The method introduces two cache-density controls, $\lambda_{\text{intra}}$ and $\lambda_{\text{inter}}$, to balance intra- and inter-block updating in the generative module, and a gate-based fusion strategy for cross-attention in the generative module. Across ADE20K, COCOStuff, and MultiGen-20M tasks, HGC achieves up to $\sim$\$63\%$ MACs reduction with only small degradations in $FID$ and CLIP scores, and extends to video generation with about $40\%$ MACs reduction, demonstrating substantial practical acceleration with preserved perceptual quality.

Abstract

Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computational requirements, resulting in generally low generation efficiency. To address this issue, we propose a Hybrid-Grained Cache (HGC) approach that reduces computational overhead by adopting cache strategies with different granularities at different computational stages. Specifically, (1) we use a coarse-grained cache (block-level) based on feature reuse to dynamically bypass redundant computations in encoder-decoder blocks between each step of model reasoning. (2) We design a fine-grained cache (prompt-level) that acts within a module, where the fine-grained cache reuses cross-attention maps within consecutive reasoning steps and extends them to the corresponding module computations of adjacent steps. These caches of different granularities can be seamlessly integrated into each computational link of the controllable generation process. We verify the effectiveness of HGC on four benchmark datasets, especially its advantages in balancing generation efficiency and visual quality. For example, on the COCO-Stuff segmentation benchmark, our HGC significantly reduces the computational cost (MACs) by 63% (from 18.22T to 6.70T), while keeping the loss of semantic fidelity (quantized performance degradation) within 1.5%.

Accelerating Controllable Generation via Hybrid-grained Cache

TL;DR

This paper tackles the inefficiency of controllable diffusion-based generation by proposing Hybrid-grained Cache (HGC), a caching framework that combines block-level coarse caching with prompt-level fine caching to reduce computation without sacrificing control fidelity. The block-level cache accelerates structure-focused steps by caching at strategically chosen moments using feature similarity, while the prompt-level cache reuses cross-attention states and fuses prompt-guided and non-prompt computations at a gate step. The method introduces two cache-density controls, and , to balance intra- and inter-block updating in the generative module, and a gate-based fusion strategy for cross-attention in the generative module. Across ADE20K, COCOStuff, and MultiGen-20M tasks, HGC achieves up to \ MACs reduction with only small degradations in and CLIP scores, and extends to video generation with about MACs reduction, demonstrating substantial practical acceleration with preserved perceptual quality.

Abstract

Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computational requirements, resulting in generally low generation efficiency. To address this issue, we propose a Hybrid-Grained Cache (HGC) approach that reduces computational overhead by adopting cache strategies with different granularities at different computational stages. Specifically, (1) we use a coarse-grained cache (block-level) based on feature reuse to dynamically bypass redundant computations in encoder-decoder blocks between each step of model reasoning. (2) We design a fine-grained cache (prompt-level) that acts within a module, where the fine-grained cache reuses cross-attention maps within consecutive reasoning steps and extends them to the corresponding module computations of adjacent steps. These caches of different granularities can be seamlessly integrated into each computational link of the controllable generation process. We verify the effectiveness of HGC on four benchmark datasets, especially its advantages in balancing generation efficiency and visual quality. For example, on the COCO-Stuff segmentation benchmark, our HGC significantly reduces the computational cost (MACs) by 63% (from 18.22T to 6.70T), while keeping the loss of semantic fidelity (quantized performance degradation) within 1.5%.

Paper Structure

This paper contains 16 sections, 14 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: We visualize the intermediate results of adding conditions at different steps in the controllable generative model: (a) adding throughout steps, (b) adding only in the first ten steps, and (c) adding only after ten steps.
  • Figure 2: Controllable Generation with our Hybrid-grained Caches (HGC), where the coarse-grained cache performs either full or partial caching of blocks across different steps. The fine-grained cache is governed by the gate step, which activates the cache of cross-attention maps at their respective steps.
  • Figure 3: The visualization of the generation with or without HGC: (a) and (b) generation with segmentation condition. (c) generation with edge map. (d) generation with the depth map.