Table of Contents
Fetching ...

BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators

Fabio Ramos, Rafael Carvalhaes Possas, Dieter Fox

TL;DR

BayesSim addresses the reality gap in robotics by learning a full posterior over simulator parameters from limited real-world observations using likelihood-free inference. It introduces a flexible, MDN-like model with both neural and quasi-Monte Carlo random Fourier features to approximate $q_oldsymbol{ heta}(m{ heta}|oldsymbol{x})$, and shows how to recover the posterior under mismatched priors. Domain randomization guided by the inferred posterior yields policies that generalize more robustly across parameter variations, outperforming uniform-prior DR in several control tasks. The approach treats simulators as black boxes, enabling principled Bayesian system identification and principled policy training for improved Sim2Real transfer, with future work extending to image-based and end-to-end representations.

Abstract

We introduce BayesSim, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator. As simulators become more sophisticated and able to represent the dynamics more accurately, fundamental problems in robotics such as motion planning and perception can be solved in simulation and solutions transferred to the physical robot. However, even the most complex simulator might still not be able to represent reality in all its details either due to inaccurate parametrization or simplistic assumptions in the dynamic models. BayesSim provides a principled framework to reason about the uncertainty of simulation parameters. Given a black box simulator (or generative model) that outputs trajectories of state and action pairs from unknown simulation parameters, followed by trajectories obtained with a physical robot, we develop a likelihood-free inference method that computes the posterior distribution of simulation parameters. This posterior can then be used in problems where Sim2Real is critical, for example in policy search. We compare the performance of BayesSim in obtaining accurate posteriors in a number of classical control and robotics problems. Results show that the posterior computed from BayesSim can be used for domain randomization outperforming alternative methods that randomize based on uniform priors.

BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators

TL;DR

BayesSim addresses the reality gap in robotics by learning a full posterior over simulator parameters from limited real-world observations using likelihood-free inference. It introduces a flexible, MDN-like model with both neural and quasi-Monte Carlo random Fourier features to approximate , and shows how to recover the posterior under mismatched priors. Domain randomization guided by the inferred posterior yields policies that generalize more robustly across parameter variations, outperforming uniform-prior DR in several control tasks. The approach treats simulators as black boxes, enabling principled Bayesian system identification and principled policy training for improved Sim2Real transfer, with future work extending to image-based and end-to-end representations.

Abstract

We introduce BayesSim, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator. As simulators become more sophisticated and able to represent the dynamics more accurately, fundamental problems in robotics such as motion planning and perception can be solved in simulation and solutions transferred to the physical robot. However, even the most complex simulator might still not be able to represent reality in all its details either due to inaccurate parametrization or simplistic assumptions in the dynamic models. BayesSim provides a principled framework to reason about the uncertainty of simulation parameters. Given a black box simulator (or generative model) that outputs trajectories of state and action pairs from unknown simulation parameters, followed by trajectories obtained with a physical robot, we develop a likelihood-free inference method that computes the posterior distribution of simulation parameters. This posterior can then be used in problems where Sim2Real is critical, for example in policy search. We compare the performance of BayesSim in obtaining accurate posteriors in a number of classical control and robotics problems. Results show that the posterior computed from BayesSim can be used for domain randomization outperforming alternative methods that randomize based on uniform priors.

Paper Structure

This paper contains 18 sections, 21 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Fetch Push and Sliding tasks: the robot has full access to the entire table and multiple iterations with the object (pushing) or one shot at pushing the object to its target (sliding).
  • Figure 2: Example of joint posteriors obtained for the CartPole problem with different parametrizations for length and masspole. The true value is indicated by a star. Note that the joint posteriors capture the multimodality of the problem when two or more explanations seem likely, for example, a longer pole length with a lighter masspole or vice versa.
  • Figure 3: Posteriors recovered by different methods for the Fetch slide problem. Note that BayesSim with random features provides a posterior that is more peaked around the true value.
  • Figure 4: Accumulated rewards for CartPole policies trained with PPO by randomizing over prior and posterior joint densities. Top left: Performance of the policy trained with the prior, over parameter length. masspole is set to actual. Top right: Similar to top left, but over multiple masspole values. Bottom left: Performance of policy trained with the posterior, over parameter length. Bottom right: Similar to bottom left, but over multiple masspole values.
  • Figure 5: Comparison between policies trained on randomizing the prior vs BayesSim posterior for different values of the simulation parameter. Top: Fetch slide problem. Bottom: Fetch push problem.