Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Jian-Jian Jiang; Xiao-Ming Wu; Yi-Xiang He; Ling-An Zeng; Yi-Lin Wei; Dandan Zhang; Wei-Shi Zheng

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He, Ling-An Zeng, Yi-Lin Wei, Dandan Zhang, Wei-Shi Zheng

TL;DR

The paper tackles the challenge of learning bimanual manipulation by recognizing that tasks can be either uncoordinated or coordinated and that integrated control struggles with high-dimensional joint actions and phase-dependent cooperation. It introduces a Decoupled Interaction Framework that assigns independent policies per arm to simplify learning of uncoordinated tasks, coupled with a selective interaction module that adaptively modulates cross-arm information to support coordination. Empirical results on the RoboTwin benchmark show substantial improvements over state-of-the-art methods (e.g., a 23.5% average gain) and strong scalability to multi-agent scenarios, along with robust real-world performance. The work demonstrates the value of task-aware decoupling and selective interaction for efficient, flexible, and scalable bimanual and multi-agent manipulation, with code to be released to the community.

Abstract

Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their actions. However, we think bimanual manipulation involves not only coordinated tasks but also various uncoordinated tasks that do not require explicit cooperation during execution, such as grasping objects with the closest hand, which integrated control frameworks ignore to consider due to their enforced cooperation in the early inputs. In this paper, we propose a novel decoupled interaction framework that considers the characteristics of different tasks in bimanual manipulation. The key insight of our framework is to assign an independent model to each arm to enhance the learning of uncoordinated tasks, while introducing a selective interaction module that adaptively learns weights from its own arm to improve the learning of coordinated tasks. Extensive experiments on seven tasks in the RoboTwin dataset demonstrate that: (1) Our framework achieves outstanding performance, with a 23.5% boost over the SOTA method. (2) Our framework is flexible and can be seamlessly integrated into existing methods. (3) Our framework can be effectively extended to multi-agent manipulation tasks, achieving a 28% boost over the integrated control SOTA. (4) The performance boost stems from the decoupled design itself, surpassing the SOTA by 16.5% in success rate with only 1/6 of the model size.

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

TL;DR

Abstract

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)