Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Pengyuan Wu; Pingrui Zhang; Zhigang Wang; Dong Wang; Bin Zhao; Xuelong Li

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Pengyuan Wu, Pingrui Zhang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

Abstract

Diffusion-based policies have achieved remarkable results in robotic manipulation but often struggle to adapt rapidly in dynamic scenarios, leading to delayed responses or task failures. We present DCDP, a Dynamic Closed-Loop Diffusion Policy framework that integrates chunk-based action generation with real-time correction. DCDP integrates a self-supervised dynamic feature encoder, cross-attention fusion, and an asymmetric action encoder-decoder to inject environmental dynamics before action execution, achieving real-time closed-loop action correction and enhancing the system's adaptability in dynamic scenarios. In dynamic PushT simulations, DCDP improves adaptability by 19\% without retraining while requiring only 5\% additional computation. Its modular design enables plug-and-play integration, achieving both temporal coherence and real-time responsiveness in dynamic robotic scenarios, including real-world manipulation tasks. The project page is at: https://github.com/wupengyuan/dcdp

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Abstract

Paper Structure (22 sections, 21 equations, 5 figures, 3 tables)

This paper contains 22 sections, 21 equations, 5 figures, 3 tables.

Introduction
Related Work
Behavior Cloning
Closed-Loop Action Chunks
Dynamic Manipulation
Method
Preliminaries
Stage 1: Fast Dynamic Aware Policy Training
History Bank Memory Learning
Differential Feature Computation
Temporal Attention
Fusion Cross-Attention
Self-Supervised with Differential
Variational Autoencoder
Loss
...and 7 more sections

Figures (5)

Figure 1: Comparison of the Open-Loop Diffusion Policy, Closed-Loop Diffusion Policy, and Our Closed-Loop with Dynamic Correction Diffusion Policy. The orange line depicts the action chunking prediction of length $H$ for the Open-Loop Diffusion Policy. Meanwhile, the green line denotes the Closed-Loop Diffusion Policy, which employs single-step inference to achieve closed-loop control. However, this approach incurs high latency and requires frequent re-planning. By contrast, our Dynamic Correction Closed-Loop Diffusion Policy (as indicated by the red line) leverages information from a length-$M$ History Bank to perform lightweight, fast corrections at each inference step, thereby achieving closed-loop control.
Figure 2: Overview of the DCDP. Our method adopts a two-stage framework. In Stage 1 (left panel of the figure), we train the Fast Dynamic-Aware Policy and a variational autoencoder (VAE). In Stage 2 (right panel of the figure), we apply training-free, per-step action corrections using the aforementioned Fast Dynamic-Aware Policy; the corrected actions are then decoded by the VAE decoder.
Figure 3: Different types of perturbations are considered, where the random perturbation updates its direction at certain time steps.
Figure 4: Visualization of how various inference strategies respond to perturbations.
Figure 5: The two tasks comprise two types of perturbations: constant-direction and random-direction. These perturbations were applied exclusively to the components highlighted in the figure.

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Abstract

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Authors

Abstract

Table of Contents

Figures (5)