When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Jingxian Lu; Wenke Xia; Yuxuan Wu; Zhiwu Lu; Di Hu

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Jingxian Lu, Wenke Xia, Yuxuan Wu, Zhiwu Lu, Di Hu

TL;DR

The Gradient Adjustment with Phase-guidance (GAP) algorithm is proposed that adaptively modulates the optimization of proprioception, enabling dynamic collaboration within the vision-proprioception policy and leading to robust and generalizable vision-proprioception policies.

Abstract

Proprioceptive information is critical for precise servo control by providing real-time robotic states. Its collaboration with vision is highly expected to enhance performances of the manipulation policy in complex tasks. However, recent studies have reported inconsistent observations on the generalization of vision-proprioception policies. In this work, we investigate this by conducting temporally controlled experiments. We found that during task sub-phases that robot's motion transitions, which require target localization, the vision modality of the vision-proprioception policy plays a limited role. Further analysis reveals that the policy naturally gravitates toward concise proprioceptive signals that offer faster loss reduction when training, thereby dominating the optimization and suppressing the learning of the visual modality during motion-transition phases. To alleviate this, we propose the Gradient Adjustment with Phase-guidance (GAP) algorithm that adaptively modulates the optimization of proprioception, enabling dynamic collaboration within the vision-proprioception policy. Specifically, we leverage proprioception to capture robotic states and estimate the probability of each timestep in the trajectory belonging to motion-transition phases. During policy learning, we apply fine-grained adjustment that reduces the magnitude of proprioception's gradient based on estimated probabilities, leading to robust and generalizable vision-proprioception policies. The comprehensive experiments demonstrate GAP is applicable in both simulated and real-world environments, across one-arm and dual-arm setups, and compatible with both conventional and Vision-Language-Action models. We believe this work can offer valuable insights into the development of vision-proprioception policies in robotic manipulation.

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

TL;DR

Abstract

Paper Structure (32 sections, 5 equations, 7 figures, 16 tables, 1 algorithm)

This paper contains 32 sections, 5 equations, 7 figures, 16 tables, 1 algorithm.

Introduction
Related Works
Vision-Proprioception Policy in Manipulation.
Modality Temporality.
Optimization Analysis of Vision-Proprioception Policies
Method
Motion Representation of Robot
Motion-Transition Phase Estimation
Gradient Adjustment for Modality Collaboration
Experiments
Experimental Setup
Can GAP lead to more robust vision-proprioception policies?
Does GAP enhance the utilization of vision modality?
Is GAP compatible with Vision-Language-Action models?
Can GAP be applied to various modality fusion approaches?
...and 17 more sections

Figures (7)

Figure 1: The generalization of vision-proprioception policies. (left) Vision-Proprioception policies perform 15.8% worse than Vision-only policies. (right) We explore this through intervening the task execution of vision-only policy during different periods, by switching to vision-proprio policy. Such intervention has minimal impact during motion-consistent phases. However, during motion-transition phases, switching leads to noticeable degradation, indicating the vision modality fails to take effect.
Figure 2: The pipeline of our Gradient Adjustment with Phase-guidance (GAP) algorithm. We define the motion representation and identify the motion-consistent phases by minimizing the total cost between phase motion and each adjacent motion. Motion-transition phase indicators are then estimated to reduce the magnitude of proprioception's backward gradient. GAP facilitates vision-proprioception policies to effectively utilize proprioception without suppressing vision modality.
Figure 3: Visualization of real-world tasks. Our experiments cover a wide range of manipulation tasks, including both One-Arm and Dual-Arm Setups.
Figure 4: The intervention experiment of the GAP-equipped vision-proprioception policy. The sight changes in success rate indicate that GAP does enhance the utilization of vision modality.
Figure 5: Visualization of Motion-Transition Phase Estimation.
...and 2 more figures

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

TL;DR

Abstract

When would Vision-Proprioception Policies Fail in Robotic Manipulation?

Authors

TL;DR

Abstract

Table of Contents

Figures (7)