Table of Contents
Fetching ...

EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation

Zhiyuan Zhang, Aditya Mohan, Seungho Han, Wan Shou, Dongyi Wang, Yu She

TL;DR

EquiBim is introduced, a symmetry-equivariant policy learning framework for bimanual manipulation that enforces bilateral equivariance between observations and actions during training and suggests that explicitly enforcing physical symmetry provides a simple yet effective inductive bias for bimanual robot learning.

Abstract

Robotic imitation learning has achieved impressive success in learning complex manipulation behaviors from demonstrations. However, many existing robot learning methods do not explicitly account for the physical symmetries of robotic systems, often resulting in asymmetric or inconsistent behaviors under symmetric observations. This limitation is particularly pronounced in dual-arm manipulation, where bilateral symmetry is inherent to both the robot morphology and the structure of many tasks. In this paper, we introduce EquiBim, a symmetry-equivariant policy learning framework for bimanual manipulation that enforces bilateral equivariance between observations and actions during training. Our approach formulates physical symmetry as a group action on both observation and action spaces, and imposes an equivariance constraint on policy predictions under symmetric transformations. The framework is model-agnostic and can be seamlessly integrated into a wide range of imitation learning pipelines with diverse observation modalities and action representations, including point cloud-based and image-based policies, as well as both end-effector-space and joint-space parameterizations. We evaluate EquiBim on RoboTwin, a dual-arm robotic platform with symmetric kinematics, and evaluate it across diverse observation and action configurations in simulation. We further validate the approach on a real-world dual-arm system. Across both simulation and physical experiments, our method consistently improves performance and robustness under distribution shifts. These results suggest that explicitly enforcing physical symmetry provides a simple yet effective inductive bias for bimanual robot learning.

EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation

TL;DR

EquiBim is introduced, a symmetry-equivariant policy learning framework for bimanual manipulation that enforces bilateral equivariance between observations and actions during training and suggests that explicitly enforcing physical symmetry provides a simple yet effective inductive bias for bimanual robot learning.

Abstract

Robotic imitation learning has achieved impressive success in learning complex manipulation behaviors from demonstrations. However, many existing robot learning methods do not explicitly account for the physical symmetries of robotic systems, often resulting in asymmetric or inconsistent behaviors under symmetric observations. This limitation is particularly pronounced in dual-arm manipulation, where bilateral symmetry is inherent to both the robot morphology and the structure of many tasks. In this paper, we introduce EquiBim, a symmetry-equivariant policy learning framework for bimanual manipulation that enforces bilateral equivariance between observations and actions during training. Our approach formulates physical symmetry as a group action on both observation and action spaces, and imposes an equivariance constraint on policy predictions under symmetric transformations. The framework is model-agnostic and can be seamlessly integrated into a wide range of imitation learning pipelines with diverse observation modalities and action representations, including point cloud-based and image-based policies, as well as both end-effector-space and joint-space parameterizations. We evaluate EquiBim on RoboTwin, a dual-arm robotic platform with symmetric kinematics, and evaluate it across diverse observation and action configurations in simulation. We further validate the approach on a real-world dual-arm system. Across both simulation and physical experiments, our method consistently improves performance and robustness under distribution shifts. These results suggest that explicitly enforcing physical symmetry provides a simple yet effective inductive bias for bimanual robot learning.
Paper Structure (13 sections, 4 equations, 6 figures, 3 tables)

This paper contains 13 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Symmetry-equivariant policy learning for bimanual manipulation. A symmetry transformation $\mathcal{S}$ defines a left--right exchange of the scene in the image coordinate frame, generating a symmetrically equivalent task instance. Given an observation $\mathcal{O}$ and its transformed counterpart $\mathcal{S}(\mathcal{O})$, the shared policy $\pi$ is trained to produce equivariant predictions under the same transformation, i.e., $\pi(\mathcal{S}(\mathcal{O})) \approx \mathcal{S}(\pi(\mathcal{O}))$. The symmetry constraint is imposed at the action level and does not require architectural modifications.
  • Figure 2: Overview of EquiBim. Given an observation $\mathcal{O}$ and its symmetrically transformed counterpart $\mathcal{S}(\mathcal{O})$, where $\mathcal{S}$ denotes a left–right reflection defined in the image coordinate frame, both inputs are processed by a shared policy $\pi$. The policy takes visual observations together with proprioceptive states as input and produces action predictions $\pi(\mathcal{O})$ and $\pi(\mathcal{S}(\mathcal{O}))$. A symmetry-equivariant loss $\mathcal{L}_{\mathrm{sym}}$ enforces the predicted actions to transform consistently under $\mathcal{S}$ during training.
  • Figure 3: Visualization of eight simulated bimanual manipulation tasks used for evaluation. Subfigures (a)–(g) are adapted from RoboTwin chen2025robotwin. For each task, the left image shows the initial state and the right image shows the goal state.
  • Figure 4: Overview of the real-world bimanual manipulation setup. Two LeRobot SO101 robotic arms operate in a shared workspace and are observed by a Logitech C920x camera. The system is evaluated on object handover and hook-hanging tasks under a top-down camera configuration.
  • Figure 5: Real-world task executions on the bimanual LeRobot platform. The figure presents three tasks, namely Object Handover (Banana), Hook Hanging (Drumstick), and Hook Hanging (Toy Chicken), each depicted across three stages: initial, intermediate, and final. Each row shows the temporal progression of the manipulation process under a top-down RGB camera view.
  • ...and 1 more figures