Table of Contents
Fetching ...

Domain Randomization via Entropy Maximization

Gabriele Tiboni, Pascal Klink, Jan Peters, Tatiana Tommasi, Carlo D'Eramo, Georgia Chalvatzaki

TL;DR

A novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data, and introduces DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities.

Abstract

Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.

Domain Randomization via Entropy Maximization

TL;DR

A novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data, and introduces DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities.

Abstract

Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.
Paper Structure (28 sections, 6 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 6 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: DORAEMON's moving Beta distributions over the plane inclination angle $\omega$, for different values of in-distribution success rate $\alpha$. The converged "max-entropy" distribution is such that the policy can solve the task with probability $\alpha$ (green for success, red for failure). The physically infeasible dynamics region is highlighted with a red background.
  • Figure 1: PandaPush task: success rate and final distance of box w.r.t. goal (cm) tested for the maximum entropy distribution averaged over $10000$ rollouts for Sim2Sim, and $30$ rollouts for Sim2Real. The task is successfully solved if the agent pushes the box within a 3cm radius of the goal.
  • Figure 2: Sim-to-Sim results: global success rate computed on the maximum-entropy uniform distribution (top) and entropy of the current training DR distribution (bottom). The number of randomized parameter dimensions is reported in parenthesis (see Table \ref{['tab:parameter_specs']} for details).
  • Figure 4: Analysis on the impact of the hyperparameter $\alpha$ (a), and of the provided lower bound return threshold $J_\text{LB}$ for defining success (b) in the Hopper environment.
  • Figure 5: PandaPush setup: the 7DoF robot arm needs to push a box of varying center-of-mass to a desired location.
  • ...and 12 more figures