Table of Contents
Fetching ...

A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models

Yilin Wang, Shangzhe Li, Haoyi Niu, Zhiao Huang, Weitong Zhang, Hao Su

TL;DR

This work tackles imitation learning for robotic manipulation when real-world expert data and rewards are scarce. It introduces a three-stage sim-to-real pipeline that pretrains a latent world model via online imitation in simulation using a CDRED reward to align expert and behavioral data, followed by offline finetuning on a small real-world dataset. Empirically, the approach yields at least $31.7\%$ improvements in sim-to-sim and $23.3\%$ improvements in sim-to-real transfer over offline baselines, with stronger out-of-distribution generalization and better data coverage attributed to online exploration. The method offers a practical path to robust, data-efficient domain transfer for manipulation tasks under reward-free conditions.

Abstract

We are interested in solving the problem of imitation learning with a limited amount of real-world expert data. Existing offline imitation methods often struggle with poor data coverage and severe performance degradation. We propose a solution that leverages robot simulators to achieve online imitation learning. Our sim-to-real framework is based on world models and combines online imitation pretraining with offline finetuning. By leveraging online interactions, our approach alleviates the data coverage limitations of offline methods, leading to improved robustness and reduced performance degradation during finetuning. It also enhances generalization during domain transfer. Our empirical results demonstrate its effectiveness, improving success rates by at least 31.7% in sim-to-sim transfer and 23.3% in sim-to-real transfer over existing offline imitation learning baselines.

A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models

TL;DR

This work tackles imitation learning for robotic manipulation when real-world expert data and rewards are scarce. It introduces a three-stage sim-to-real pipeline that pretrains a latent world model via online imitation in simulation using a CDRED reward to align expert and behavioral data, followed by offline finetuning on a small real-world dataset. Empirically, the approach yields at least improvements in sim-to-sim and improvements in sim-to-real transfer over offline baselines, with stronger out-of-distribution generalization and better data coverage attributed to online exploration. The method offers a practical path to robust, data-efficient domain transfer for manipulation tasks under reward-free conditions.

Abstract

We are interested in solving the problem of imitation learning with a limited amount of real-world expert data. Existing offline imitation methods often struggle with poor data coverage and severe performance degradation. We propose a solution that leverages robot simulators to achieve online imitation learning. Our sim-to-real framework is based on world models and combines online imitation pretraining with offline finetuning. By leveraging online interactions, our approach alleviates the data coverage limitations of offline methods, leading to improved robustness and reduced performance degradation during finetuning. It also enhances generalization during domain transfer. Our empirical results demonstrate its effectiveness, improving success rates by at least 31.7% in sim-to-sim transfer and 23.3% in sim-to-real transfer over existing offline imitation learning baselines.

Paper Structure

This paper contains 30 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Sim-to-Real Pipeline with Online Imitation Pretraining. We illustrate the pretraining and finetuning pipeline of our proposed method. During pretraining, the world model is trained using a reward signal from a jointly trained discriminator that distinguishes expert from online interaction data. In the finetuning phase, the pretrained encoder and policy are refined using a small dataset of real-world expert demonstrations.
  • Figure 2: State Pre-processing during Real Deployment. We illustrate the state pre-processing pipeline used during real-world deployment. The object pose is estimated from RGBD camera observations using SAM2 and FoundationPose. This estimate is then combined with the robot’s proprioceptive states to form the input to the world model, which in turn generates actions for robotic control.
  • Figure 3: Illustrations of Tasks. The first three rows show real-world demonstrations of the Cabinet Open, Pick Cube, and Push Cube tasks, while the last row presents the initial and final successful states of the Peg Insertion, Roll Ball, and Lift Peg Upright tasks in simulation.
  • Figure 4: Illustration of Robots. We show the two robot types used in our experiments: our customized robot and the Franka Panda. Since the Franka Panda arm is only used in simulation, we provide its simulated rendering, whereas for our customized robot we show real-world photo.
  • Figure 5: Domain Gap in Sim-to-Sim Domain Transfer. We illustrate the two types of domain gaps considered in our sim-to-sim transfer experiments. During finetuning in the target domain, the policy must adapt to either biased pose estimation or a shifted rail position.
  • ...and 1 more figures