Table of Contents
Fetching ...

Block-wise Adaptive Caching for Accelerating Diffusion Policy

Kangye Ji, Yuan Meng, Hanyun Cui, Ye Li, Shengjia Hua, Lei Chen, Zhi Wang

TL;DR

This work tackles the computational bottleneck of Diffusion Policy for real-time robotic control by introducing Block-wise Adaptive Caching (BAC), a training-free, block-level caching framework. BAC combines an Adaptive Caching Scheduler to optimally update caches per transformer block and a Bubbling Union Algorithm to curb inter-block error propagation, addressing the error surge phenomena observed when caching is extended to blocks. The approach yields lossless acceleration and up to approximately 3x speedup on multiple robotic benchmarks, with robust performance and clear ablations showing the necessity of ACS and BUA. This method enables practical real-time diffusion-based visuomotor control without additional training, broadening the applicability of Diffusion Policy in robotics and vision-language-action systems.

Abstract

Diffusion Policy has demonstrated strong visuomotor modeling capabilities, but its high computational cost renders it impractical for real-time robotic control. Despite huge redundancy across repetitive denoising steps, existing diffusion acceleration techniques fail to generalize to Diffusion Policy due to fundamental architectural and data divergences. In this paper, we propose Block-wise Adaptive Caching(BAC), a method to accelerate Diffusion Policy by caching intermediate action features. BAC achieves lossless action generation acceleration by adaptively updating and reusing cached features at the block level, based on a key observation that feature similarities vary non-uniformly across timesteps and locks. To operationalize this insight, we first propose the Adaptive Caching Scheduler, designed to identify optimal update timesteps by maximizing the global feature similarities between cached and skipped features. However, applying this scheduler for each block leads to signiffcant error surges due to the inter-block propagation of caching errors, particularly within Feed-Forward Network (FFN) blocks. To mitigate this issue, we develop the Bubbling Union Algorithm, which truncates these errors by updating the upstream blocks with signiffcant caching errors before downstream FFNs. As a training-free plugin, BAC is readily integrable with existing transformer-based Diffusion Policy and vision-language-action models. Extensive experiments on multiple robotic benchmarks demonstrate that BAC achieves up to 3x inference speedup for free.

Block-wise Adaptive Caching for Accelerating Diffusion Policy

TL;DR

This work tackles the computational bottleneck of Diffusion Policy for real-time robotic control by introducing Block-wise Adaptive Caching (BAC), a training-free, block-level caching framework. BAC combines an Adaptive Caching Scheduler to optimally update caches per transformer block and a Bubbling Union Algorithm to curb inter-block error propagation, addressing the error surge phenomena observed when caching is extended to blocks. The approach yields lossless acceleration and up to approximately 3x speedup on multiple robotic benchmarks, with robust performance and clear ablations showing the necessity of ACS and BUA. This method enables practical real-time diffusion-based visuomotor control without additional training, broadening the applicability of Diffusion Policy in robotics and vision-language-action systems.

Abstract

Diffusion Policy has demonstrated strong visuomotor modeling capabilities, but its high computational cost renders it impractical for real-time robotic control. Despite huge redundancy across repetitive denoising steps, existing diffusion acceleration techniques fail to generalize to Diffusion Policy due to fundamental architectural and data divergences. In this paper, we propose Block-wise Adaptive Caching(BAC), a method to accelerate Diffusion Policy by caching intermediate action features. BAC achieves lossless action generation acceleration by adaptively updating and reusing cached features at the block level, based on a key observation that feature similarities vary non-uniformly across timesteps and locks. To operationalize this insight, we first propose the Adaptive Caching Scheduler, designed to identify optimal update timesteps by maximizing the global feature similarities between cached and skipped features. However, applying this scheduler for each block leads to signiffcant error surges due to the inter-block propagation of caching errors, particularly within Feed-Forward Network (FFN) blocks. To mitigate this issue, we develop the Bubbling Union Algorithm, which truncates these errors by updating the upstream blocks with signiffcant caching errors before downstream FFNs. As a training-free plugin, BAC is readily integrable with existing transformer-based Diffusion Policy and vision-language-action models. Extensive experiments on multiple robotic benchmarks demonstrate that BAC achieves up to 3x inference speedup for free.

Paper Structure

This paper contains 38 sections, 2 theorems, 27 equations, 18 figures, 29 tables.

Key Result

Proposition 3.1

Given an upstream error $\delta$, we have where

Figures (18)

  • Figure 1: Temporal and block-wise feature similarity patterns. (a) Similarity matrices of blocks in the third decoder layer. (b) Similarity change curves of different blocks. The feature similarity between consecutive timesteps varies non-uniformly over time and differs across blocks.
  • Figure 2: Framework of Block-wise Adaptive Caching (BAC). BAC enables adaptive feature caching by introducing the Adaptive Caching Scheduler, and further supports block-wise scheduling through the Bubbling Union Algorithm.
  • Figure 3:
  • Figure 4:
  • Figure 5:
  • ...and 13 more figures

Theorems & Definitions (4)

  • Proposition 3.1
  • Remark 3.1
  • Proposition A.1
  • proof