Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu; Jia Zeng; Li Chen; Yanchao Yang; Guyue Zhou; Junchi Yan; Ping Luo; Heming Cui; Yi Ma; Hongyang Li

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, Hongyang Li

TL;DR

<3-5 sentence high-level summary>

Abstract

Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-trained visual representations, yet their efficacy and adaptability have been found to be constrained. Inspired by classic closed-loop control systems, we propose CLOVER, a closed-loop visuomotor control framework that incorporates feedback mechanisms to improve adaptive robotic control. CLOVER consists of a text-conditioned video diffusion model for generating visual plans as reference inputs, a measurable embedding space for accurate error quantification, and a feedback-driven controller that refines actions from feedback and initiates replans as needed. Our framework exhibits notable advancement in real-world robotic tasks and achieves state-of-the-art on CALVIN benchmark, improving by 8% over previous open-loop counterparts. Code and checkpoints are maintained at https://github.com/OpenDriveLab/CLOVER.

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

TL;DR

<3-5 sentence high-level summary>

Abstract

Paper Structure (20 sections, 4 equations, 17 figures, 6 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 17 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Methodology
Visual Planner
Feedback-Driven Policy
CLOVER
Experiments
Experimental Setup
Main Results
Discussion on Closed-loop v.s. Open-loop
Ablation Studies
Conclusion
Examples of Test-time Execution with Error Measurement
Implementation Details
Model Architecture
...and 5 more sections

Figures (17)

Figure 1: Motivation. The proposed CLOVER is inspired by the classic closed-loop control in automation systems (a). Our framework (b) employs a visual planner to predetermine a sequence of sub-goals (\ref{['sec:planner']}). Then these goals guide the policy to generate actions with an error measurement strategy (\ref{['sec:executor']}). Within the feedback loop, it automatically replans when the sub-goal is infeasible, and adapts to to the next one upon achievement (\ref{['sec:feedback']}).
Figure 2: Architecture of our feedback-driven policy.1) The state encoder takes in both current observation along with the synthesized sub-goal. A shared multimodal encoder generates fused RGB-D features, followed by two queries extracting informative features as the current and goal embeddings respectively. 2) The discrepancy of the two state embeddings is explicitly modeled as errors. 3) The resultant residual in error measurement is ultimately decoded to the final action.
Figure 3: Comparison on the measurement ability of different embeddings. We visualize the cosine distance between embeddings of observations and generated sub-goals during a roll-out process. (a) CLIP feature radford2021clip and (b) state embeddings trained without error measuring do not hold clear interrelations among frames. While (c) state embeddings obtained from our policy distribute reasonably in the latent space which benefits measuring the errors in feedback loops.
Figure 4: Real-world robot setting. We propose a long-horizon task encompassing three consecutive sub-tasks, where the failure of a prequel task will inevitably lead to failure of subsequent tasks. The additional single tasks are designed to validate the generalizability of CLOVER of all aspects.
Figure 5: Experiment setting of the generalization evaluation. We place entirely new objects absent from training, alongside the interaction object to introduce visual distraction. We test policies under dynamic conditions by randomly placing and picking up a doll to create unpredictable visual changes.
...and 12 more figures

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

TL;DR

Abstract

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)