Block-wise Adaptive Caching for Accelerating Diffusion Policy
Kangye Ji, Yuan Meng, Hanyun Cui, Ye Li, Shengjia Hua, Lei Chen, Zhi Wang
TL;DR
This work tackles the computational bottleneck of Diffusion Policy for real-time robotic control by introducing Block-wise Adaptive Caching (BAC), a training-free, block-level caching framework. BAC combines an Adaptive Caching Scheduler to optimally update caches per transformer block and a Bubbling Union Algorithm to curb inter-block error propagation, addressing the error surge phenomena observed when caching is extended to blocks. The approach yields lossless acceleration and up to approximately 3x speedup on multiple robotic benchmarks, with robust performance and clear ablations showing the necessity of ACS and BUA. This method enables practical real-time diffusion-based visuomotor control without additional training, broadening the applicability of Diffusion Policy in robotics and vision-language-action systems.
Abstract
Diffusion Policy has demonstrated strong visuomotor modeling capabilities, but its high computational cost renders it impractical for real-time robotic control. Despite huge redundancy across repetitive denoising steps, existing diffusion acceleration techniques fail to generalize to Diffusion Policy due to fundamental architectural and data divergences. In this paper, we propose Block-wise Adaptive Caching(BAC), a method to accelerate Diffusion Policy by caching intermediate action features. BAC achieves lossless action generation acceleration by adaptively updating and reusing cached features at the block level, based on a key observation that feature similarities vary non-uniformly across timesteps and locks. To operationalize this insight, we first propose the Adaptive Caching Scheduler, designed to identify optimal update timesteps by maximizing the global feature similarities between cached and skipped features. However, applying this scheduler for each block leads to signiffcant error surges due to the inter-block propagation of caching errors, particularly within Feed-Forward Network (FFN) blocks. To mitigate this issue, we develop the Bubbling Union Algorithm, which truncates these errors by updating the upstream blocks with signiffcant caching errors before downstream FFNs. As a training-free plugin, BAC is readily integrable with existing transformer-based Diffusion Policy and vision-language-action models. Extensive experiments on multiple robotic benchmarks demonstrate that BAC achieves up to 3x inference speedup for free.
