Table of Contents
Fetching ...

Shadow: Leveraging Segmentation Masks for Cross-Embodiment Policy Transfer

Marion Lepert, Ria Doshi, Jeannette Bohg

TL;DR

Shadow addresses cross-embodiment transfer by training a policy on a single source robot using edited observations in which both the source and target robots are represented by segmentation masks aligned to the same end-effector pose $x_{ee}$. During training, the source is masked and the target's mask is overlaid, while evaluation uses the reverse setup, ensuring train/test input distributions remain similar without collecting target-robot data. The approach achieves data-efficient zero-shot transfer, outperforming in-painting baselines like Mirage and approaching or matching source-performance on several tasks in both simulation and real hardware, with an average gain of over $2\times$ on real-world experiments. This method reduces data collection requirements and enables robust policy transfer across unseen embodiments, with clear limitations around camera calibration, scene generalization, and per-embodiment policy training. Shadow thus offers a practical, scalable path for leveraging existing robot data to generalize policies to new hardware.

Abstract

Data collection in robotics is spread across diverse hardware, and this variation will increase as new hardware is developed. Effective use of this growing body of data requires methods capable of learning from diverse robot embodiments. We consider the setting of training a policy using expert trajectories from a single robot arm (the source), and evaluating on a different robot arm for which no data was collected (the target). We present a data editing scheme termed Shadow, in which the robot during training and evaluation is replaced with a composite segmentation mask of the source and target robots. In this way, the input data distribution at train and test time match closely, enabling robust policy transfer to the new unseen robot while being far more data efficient than approaches that require co-training on large amounts of data from diverse embodiments. We demonstrate that an approach as simple as Shadow is effective both in simulation on varying tasks and robots, and on real robot hardware, where Shadow demonstrates an average of over 2x improvement in success rate compared to the strongest baseline.

Shadow: Leveraging Segmentation Masks for Cross-Embodiment Policy Transfer

TL;DR

Shadow addresses cross-embodiment transfer by training a policy on a single source robot using edited observations in which both the source and target robots are represented by segmentation masks aligned to the same end-effector pose . During training, the source is masked and the target's mask is overlaid, while evaluation uses the reverse setup, ensuring train/test input distributions remain similar without collecting target-robot data. The approach achieves data-efficient zero-shot transfer, outperforming in-painting baselines like Mirage and approaching or matching source-performance on several tasks in both simulation and real hardware, with an average gain of over on real-world experiments. This method reduces data collection requirements and enables robust policy transfer across unseen embodiments, with clear limitations around camera calibration, scene generalization, and per-embodiment policy training. Shadow thus offers a practical, scalable path for leveraging existing robot data to generalize policies to new hardware.

Abstract

Data collection in robotics is spread across diverse hardware, and this variation will increase as new hardware is developed. Effective use of this growing body of data requires methods capable of learning from diverse robot embodiments. We consider the setting of training a policy using expert trajectories from a single robot arm (the source), and evaluating on a different robot arm for which no data was collected (the target). We present a data editing scheme termed Shadow, in which the robot during training and evaluation is replaced with a composite segmentation mask of the source and target robots. In this way, the input data distribution at train and test time match closely, enabling robust policy transfer to the new unseen robot while being far more data efficient than approaches that require co-training on large amounts of data from diverse embodiments. We demonstrate that an approach as simple as Shadow is effective both in simulation on varying tasks and robots, and on real robot hardware, where Shadow demonstrates an average of over 2x improvement in success rate compared to the strongest baseline.

Paper Structure

This paper contains 24 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Schematic of Shadow. Policy is trained on source robot and evaluated on the target robot. No data is collected for the target robot. Observation images are overlaid with the segmentation masks of the source and the target robots. During training, the target robot is rendered to be at the same end-effector pose, $x_{ee}$, as the source robot and vice-versa during evaluation.
  • Figure 2: In simulation, we evaluate the effectiveness of Shadow in transferring a policy trained on a Panda with a Robotiq 2F85 gripper to the target robot/gripper combinations shown here. On real-world hardware, we transfer to the Panda robot with the Franka gripper and the UR5e robot with the Robotiq 2F85 gripper.
  • Figure 3: The six tasks evaluated in simulation, shown from the viewpoint used during training.
  • Figure 4: Different robot, different gripper: Evaluation in simulation over six tasks and four target robots (Panda, Sawyer, IIWA, UR5e; each with the Franka gripper) (100 roll-outs per evaluation). The source robot is Panda+Robotiq 2F85 gripper. Dashed line: policy trained and evaluated on source robot. $*$ denotes statistically inferior to Shadow ($p < 0.05$). Shadow outperforms Mirage on all tasks, and shows no performance degradation compared to source robot in 5/6 tasks.
  • Figure 5: Different robot, same gripper: Evaluation in simulation over six tasks and three target robots (Sawyer, IIWA, UR5e; each with the Robotiq 2F85 gripper) (100 roll-outs per evaluation). The source robot is Panda+Robotiq 2F85 gripper. Dashed line: policy trained and evaluated on source robot. $*$ denotes statistically inferior to Shadow ($p < 0.05$). Shadow outperforms Mirage on all tasks, and shows no performance degradation compared to the source robot in 5/6 tasks.
  • ...and 4 more figures