Table of Contents
Fetching ...

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, Dieter Fox

TL;DR

This work tackles the sim-to-real transfer problem by automatically adapting the distribution of simulated parameters to better match real-world policy behavior, thereby closing the reality gap without exact scene replication. The proposed SimOpt framework learns a Gaussian distribution over simulation parameters and updates it via a KL-divergence constrained, gradient-free optimization guided by a discrepancy between real and simulated observations, using partial real-world data. Implemented on a GPU-accelerated pipeline with NVIDIA Flex and PPO, SimOpt is validated on two real-robot tasks (swing-peg-in-hole and drawer opening) with ABB Yumi and Franka Panda, demonstrating transfer after only a few iterations and a small number of real-world roll-outs. The results show that automated, data-driven adaptation of simulation randomization yields more reliable real-world policy transfer than wide, manually designed randomization, suggesting a practical path toward robust sim-to-real robotics. Future work includes extending to multi-modal parameter distributions and incorporating richer sensor data such as vision and touch.

Abstract

We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

TL;DR

This work tackles the sim-to-real transfer problem by automatically adapting the distribution of simulated parameters to better match real-world policy behavior, thereby closing the reality gap without exact scene replication. The proposed SimOpt framework learns a Gaussian distribution over simulation parameters and updates it via a KL-divergence constrained, gradient-free optimization guided by a discrepancy between real and simulated observations, using partial real-world data. Implemented on a GPU-accelerated pipeline with NVIDIA Flex and PPO, SimOpt is validated on two real-robot tasks (swing-peg-in-hole and drawer opening) with ABB Yumi and Franka Panda, demonstrating transfer after only a few iterations and a small number of real-world roll-outs. The results show that automated, data-driven adaptation of simulation randomization yields more reliable real-world policy transfer than wide, manually designed randomization, suggesting a practical path toward robust sim-to-real robotics. Future work includes extending to multi-modal parameter distributions and incorporating richer sensor data such as vision and touch.

Abstract

We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt

Paper Structure

This paper contains 19 sections, 4 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 2: The pipeline for optimizing the simulation parameter distribution. After training a policy on current distribution, we sample the policy both in the real world and for a range of parameters in simulation. The discrepancy between the simulated and real observations is used to update the simulation parameter distribution in SimOpt.
  • Figure 3: An example of a wide distribution of simulation parameters in the swing-peg-in-hole task where it is not possible to find a solution for many of the task instances.
  • Figure 4: Performance of the policy training with standard domain randomization for different variances of the distribution of the cabinet position along the X-axis in the drawer opening task.
  • Figure 5: Initial distribution of the cabinet position in the source environment, located at extreme left, slowly starts to change to the target environment distribution as a function of running 5 iterations of SimOpt.
  • Figure 6: Policy performance in the target drawer opening environment trained on randomized simulation parameters at different iterations of SimOpt. As the source environment distribution gets adjusted, the policy transfer improves until the robot can successfully solve the task in the fourth SimOpt iteration.
  • ...and 3 more figures