Table of Contents
Fetching ...

Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training

Shuo Cheng, Liqian Ma, Zhenyang Chen, Ajay Mandlekar, Caelan Garrett, Danfei Xu

TL;DR

This work proposes a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations, and embeds an Optimal Transport-inspired loss within the co-training framework to handle the imbalance between abundant simulation data and limited real-world examples.

Abstract

Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation. Project webpage: https://ot-sim2real.github.io/.

Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training

TL;DR

This work proposes a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations, and embeds an Optimal Transport-inspired loss within the co-training framework to handle the imbalance between abundant simulation data and limited real-world examples.

Abstract

Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation. Project webpage: https://ot-sim2real.github.io/.

Paper Structure

This paper contains 26 sections, 4 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Sim-and-Real Co-Training with Optimal Transport. We use behavior cloning to train a real-world policy from sparse real-world and dense simulation demos. Leveraging Optimal Transport to align feature spaces, our method enables generalization to scenarios seen only in simulation.
  • Figure 2: Method Overview. Our sim-and-real co-training framework learns a domain-invariant latent space to improve real-world performance using abundant simulation demos and a small number of real-world demos. It leverages an Unbalanced Optimal Transport loss and a temporal sampling strategy to address data imbalance and improve alignment quality during mini-batch training.
  • Figure 3: Evaluation Task Suites. We evaluate our methods on 6 different tasks in the real world ( top) and simulation ( bottom) to demonstrate the effectiveness of sim-to-real transfer.
  • Figure 4: (a) Latent Space Visualization. Our OT alignment maps source domain samples (blue) and target domain samples (red) nearby in the latent space, yielding a single, well‑mixed cluster. This overlap demonstrates that OT alignment effectively synchronizes cross‑domain feature distributions, improving sim-to-real transfer. (b) Out-Of-Distribution Performance. Scaling the number of simulation demonstrations leads to significant OOD success rate gains.
  • Figure 5: Hardware Setup. Our hardware platform uses a Franka Emika Panda robot, with an Intel RealSense D435 camera for capturing image and depth, and a Meta Quest 3 headset for teleoperation.
  • ...and 6 more figures