Table of Contents
Fetching ...

Data Analogies Enable Efficient Cross-Embodiment Transfer

Jonathan Yang, Chelsea Finn, Dorsa Sadigh

TL;DR

This work conducts controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compares the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways to ask: what form of demonstration data is most useful for enabling transfer across robot set-ups.

Abstract

Generalist robot policies are trained on demonstrations collected across a wide variety of robots, scenes, and viewpoints. Yet it remains unclear how to best organize and scale such heterogeneous data so that it genuinely improves performance in a given target setting. In this work, we ask: what form of demonstration data is most useful for enabling transfer across robot set-ups? We conduct controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compare the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways. Our simulated experiments show that while perceptual shifts such as viewpoint benefit most from broad diversity, morphology shifts benefit far less from unstructured diversity and instead see the largest gains from data analogies, i.e. paired demonstrations that align scenes, tasks, and/or trajectories across different embodiments. Informed by the simulation results, we improve real-world cross-embodiment transfer success by an average of $22.5\%$ over large-scale, unpaired datasets by changing only the composition of the data.

Data Analogies Enable Efficient Cross-Embodiment Transfer

TL;DR

This work conducts controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compares the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways to ask: what form of demonstration data is most useful for enabling transfer across robot set-ups.

Abstract

Generalist robot policies are trained on demonstrations collected across a wide variety of robots, scenes, and viewpoints. Yet it remains unclear how to best organize and scale such heterogeneous data so that it genuinely improves performance in a given target setting. In this work, we ask: what form of demonstration data is most useful for enabling transfer across robot set-ups? We conduct controlled experiments that vary end-effector morphology, robot platform appearance, and camera perspective, and compare the effects of simply scaling the number of demonstrations against systematically broadening the diversity in different ways. Our simulated experiments show that while perceptual shifts such as viewpoint benefit most from broad diversity, morphology shifts benefit far less from unstructured diversity and instead see the largest gains from data analogies, i.e. paired demonstrations that align scenes, tasks, and/or trajectories across different embodiments. Informed by the simulation results, we improve real-world cross-embodiment transfer success by an average of over large-scale, unpaired datasets by changing only the composition of the data.
Paper Structure (19 sections, 11 figures, 6 tables)

This paper contains 19 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Cross-Embodiment Data Analogies. We study how to collect data so that demonstrations from one robot directly help another. Our data-centric recipe composes breadth (coverage across viewpoints, morphologies, and scenes). Then, we search different data scaling strategies to find one that leads to high performance under a fixed budget. We find that datasets with high pairing between scenes and tasks as well as high coverage lead to high transfer performance.
  • Figure 2: Domain Shift Axes. We study the role of data diversity and pairing across three domain shift axes: end-effector morphology, camera perspective, and visual appearance.
  • Figure 3: Coverage versus Pairing. Simulation images depicting the data collection strategies. Coverage is the diversity of data on the generalization axis, while pairing is the similarity of the tasks or trajectories in the data.
  • Figure 4: Sim and Real-World Robots. We train a cross-embodiment policy to transfer the tasks of putting a pen in a cup and putting a book on a bookshelf to a new robot. We evaluate with the Franka Emika Panda, WidowX, and ARX Piper robot arms.
  • Figure 5: Main coverage plot (Success Rate). Success Rate (%) on the target robot across Coverage$\times$Pairing for (a) Viewpoint, (b) Morphology, (c) Appearance. Error bars: 95% CI. Dashed lines show Target-only (few-shot) and Target upper bound (same extra budget on target).
  • ...and 6 more figures