GRAM: Generalization in Deep RL with a Robust Adaptation Module
James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan P. How
TL;DR
GRAM addresses the challenge of generalizing deep RL policies to both in-distribution and unseen out-of-distribution dynamics. It unifies adaptive ID performance and robust OOD robustness through a robust adaptation module based on an epistemic neural network, and a joint training pipeline that combines teacher-student adaptation with adversarial RL. The key contributions are the robust adaptation module with a GRAM posterior mechanism and a training scheme that jointly optimizes ID and OOD behavior, validated on simulated and real quadruped locomotion. The results show strong ID performance comparable to contextual/domain randomization methods while achieving robust OOD behavior, enabling effective sim-to-real transfer in diverse environments.
Abstract
The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate through extensive simulation and hardware locomotion experiments on a quadruped robot.
