Table of Contents
Fetching ...

Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies

Marco Iannotta, Yuxuan Yang, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov

TL;DR

This work tackles the sim-to-real gap in robotic reinforcement learning by conditioning policies on an explicit learned dynamics context, integrated into a domain-randomization framework. It introduces a context estimator that, together with the policy, is trained in a unified off-policy loop and evaluated across a classic control task and a real-world pushing task on a Franka Panda. The study compares three supervision strategies for learning the context representation—ground-truth regression, forward dynamics prediction, and policy-loss supervision—and finds that context-aware policies consistently outperform context-agnostic baselines, though the best strategy is task-dependent. The results highlight practical considerations for zero-shot generalization and suggest that context-modulated policies can offer robust transfer in real-world deployments, while also pointing to the computational challenges of evaluating high-dimensional context spaces.

Abstract

Sim-to-real transfer remains a major challenge in reinforcement learning (RL) for robotics, as policies trained in simulation often fail to generalize to the real world due to discrepancies in environment dynamics. Domain Randomization (DR) mitigates this issue by exposing the policy to a wide range of randomized dynamics during training, yet leading to a reduction in performance. While standard approaches typically train policies agnostic to these variations, we investigate whether sim-to-real transfer can be improved by conditioning the policy on an estimate of the dynamics parameters -- referred to as context. To this end, we integrate a context estimation module into a DR-based RL framework and systematically compare SOTA supervision strategies. We evaluate the resulting context-aware policies in both a canonical control benchmark and a real-world pushing task using a Franka Emika Panda robot. Results show that context-aware policies outperform the context-agnostic baseline across all settings, although the best supervision strategy depends on the task.

Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies

TL;DR

This work tackles the sim-to-real gap in robotic reinforcement learning by conditioning policies on an explicit learned dynamics context, integrated into a domain-randomization framework. It introduces a context estimator that, together with the policy, is trained in a unified off-policy loop and evaluated across a classic control task and a real-world pushing task on a Franka Panda. The study compares three supervision strategies for learning the context representation—ground-truth regression, forward dynamics prediction, and policy-loss supervision—and finds that context-aware policies consistently outperform context-agnostic baselines, though the best strategy is task-dependent. The results highlight practical considerations for zero-shot generalization and suggest that context-modulated policies can offer robust transfer in real-world deployments, while also pointing to the computational challenges of evaluating high-dimensional context spaces.

Abstract

Sim-to-real transfer remains a major challenge in reinforcement learning (RL) for robotics, as policies trained in simulation often fail to generalize to the real world due to discrepancies in environment dynamics. Domain Randomization (DR) mitigates this issue by exposing the policy to a wide range of randomized dynamics during training, yet leading to a reduction in performance. While standard approaches typically train policies agnostic to these variations, we investigate whether sim-to-real transfer can be improved by conditioning the policy on an estimate of the dynamics parameters -- referred to as context. To this end, we integrate a context estimation module into a DR-based RL framework and systematically compare SOTA supervision strategies. We evaluate the resulting context-aware policies in both a canonical control benchmark and a real-world pushing task using a Franka Emika Panda robot. Results show that context-aware policies outperform the context-agnostic baseline across all settings, although the best supervision strategy depends on the task.

Paper Structure

This paper contains 13 sections, 9 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Setup employed for the experimental task evaluated in \ref{['sec:eval_real']} --- pushing a box to a desired location --- featuring the Franka Emika Panda robot and its digital twin in the AGX Dynamics simulation agx.
  • Figure 2: Illustration of the planar pushing task, showing the object and end-effector positions at two consecutive time steps. Solid blue dots represent the object positions, while dashed blue dots represent its center of mass. Red circles represent the position of the cylindrical tool mounted on the robot hand, which is used to push the object. Green circles represent the target end-effector positions issued to the Cartesian Impedance Controller to command the robot end-effector.
  • Figure 3: Context configurations used for real-world evaluation of the pushing task with center of mass variation, obtained by combining 3 surface materials 4 box variants. The surface materials differ in their friction coefficients, while the boxes vary in both mass and mass distribution, achieved by redistributing the filling material using internal separators.
  • Figure 4: Comparison of the best‐performing FP and PL policies on the pushing task without center of mass variation. Dashed circles denote the success threshold. Left. Successful box trajectories (faded lines) and their average ones (bold lines); legend values denote the average number of steps and the percentage of successful trials. Right. End box positions of failed trajectories (faded dots) and their centroids (bold crosses); legend values denote the average final distance to the goal and the percentage of failures.