Table of Contents
Fetching ...

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation

Guanxing Lu, Tengbo Yu, Haoyuan Deng, Season Si Chen, Yansong Tang, Ziwei Wang

TL;DR

The paper tackles the data bottleneck in general bimanual manipulation by transferring knowledge from pretrained unimanual policies. It introduces AnyBimanual, a model-agnostic framework comprising a skill manager to schedule unimanual primitives and a visual aligner to mitigate observation gaps, enabling plug-and-play transfer with few bimanual demonstrations. Through a joint objective combining behavior cloning with sparsity- and voxel-alignment regularizations, the approach achieves strong results on RLBench2 (average 32.00% success) and real-world tasks (84.62% average), including effective transfers to multiple baselines. While showing practical multi-task capability, the work notes limitations in cross-embodiment transfer and zero-shot generalization, pointing to future work on broader cross-task adaptability and improved perception alignment.

Abstract

Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pre-trained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pre-trained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation

TL;DR

The paper tackles the data bottleneck in general bimanual manipulation by transferring knowledge from pretrained unimanual policies. It introduces AnyBimanual, a model-agnostic framework comprising a skill manager to schedule unimanual primitives and a visual aligner to mitigate observation gaps, enabling plug-and-play transfer with few bimanual demonstrations. Through a joint objective combining behavior cloning with sparsity- and voxel-alignment regularizations, the approach achieves strong results on RLBench2 (average 32.00% success) and real-world tasks (84.62% average), including effective transfers to multiple baselines. While showing practical multi-task capability, the work notes limitations in cross-embodiment transfer and zero-shot generalization, pointing to future work on broader cross-task adaptability and improved perception alignment.

Abstract

Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pre-trained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pre-trained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.

Paper Structure

This paper contains 31 sections, 6 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: AnyBimanual enables plug-and-play transferring from pretrained unimanual policies to bimanual manipulation policy, which preserves the generalizability with the proposed skill scheduling framework.
  • Figure 2: The overall pipeline of AnyBimanual, which primarily consists of a skill manager and a perception manager. The skill manager adaptively coordinates primitive skills for each robot arm, while the perception manager mitigates the distributional shift from unimanual to bimanual by decomposing the 3D voxel observation for each arm.
  • Figure 3: Shareable skills across unimanual and bimanual settings. We observe that bimanual tasks are often originated from the combination of unimanual sub-tasks, which thus can be solved by effectively coordinating unimanual skills synchronously or asynchronously.
  • Figure 4: Visualization of AnyBimanual. This figure shows in different key timesteps, how the skill manager dynamically schedules skill weights and how the visual aligner decomposes volumetric observation. We use a logarithmic scale for visualization.
  • Figure 5: Real-World Tasks. The real-world experiments are performed in a tabletop setup with objects randomized in location every episode. AnyBimanual can simultaneously conduct $9$ complex real-world bimanual manipulation tasks with one model. Different colors mean different success rates.
  • ...and 12 more figures