Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies
Marco Iannotta, Yuxuan Yang, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov
TL;DR
This work tackles the sim-to-real gap in robotic reinforcement learning by conditioning policies on an explicit learned dynamics context, integrated into a domain-randomization framework. It introduces a context estimator that, together with the policy, is trained in a unified off-policy loop and evaluated across a classic control task and a real-world pushing task on a Franka Panda. The study compares three supervision strategies for learning the context representation—ground-truth regression, forward dynamics prediction, and policy-loss supervision—and finds that context-aware policies consistently outperform context-agnostic baselines, though the best strategy is task-dependent. The results highlight practical considerations for zero-shot generalization and suggest that context-modulated policies can offer robust transfer in real-world deployments, while also pointing to the computational challenges of evaluating high-dimensional context spaces.
Abstract
Sim-to-real transfer remains a major challenge in reinforcement learning (RL) for robotics, as policies trained in simulation often fail to generalize to the real world due to discrepancies in environment dynamics. Domain Randomization (DR) mitigates this issue by exposing the policy to a wide range of randomized dynamics during training, yet leading to a reduction in performance. While standard approaches typically train policies agnostic to these variations, we investigate whether sim-to-real transfer can be improved by conditioning the policy on an estimate of the dynamics parameters -- referred to as context. To this end, we integrate a context estimation module into a DR-based RL framework and systematically compare SOTA supervision strategies. We evaluate the resulting context-aware policies in both a canonical control benchmark and a real-world pushing task using a Franka Emika Panda robot. Results show that context-aware policies outperform the context-agnostic baseline across all settings, although the best supervision strategy depends on the task.
