Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning
Kaixi Bao, Chenhao Li, Yarden As, Andreas Krause, Marco Hutter
TL;DR
This paper tackles out-of-distribution generalization in reinforcement learning by introducing memory augmentation that couples task-structured experience augmentation with a memory-based context encoder. The authors formalize the setting as a partially observable problem with a latent task context and optimize a policy across a distribution of tasks, using the objective $J(pi) = E_{T \sim p(T), \tau \sim p_{pi}(\tau|T)} [ \sum_{t=0}^{\infty} \gamma^t r(s_t, a_t) ]$; they implement task-structured augmentations and a memory module to infer context, yielding a unified policy that performs well on both ID and augmented OOD tasks. Experiments across eight legged locomotion tasks, including sim-to-real hardware tests on a quadruped, demonstrate zero-shot generalization to augmented OOD tasks while preserving ID performance and achieving higher sample efficiency than full randomization. The results highlight the practical potential of memory-informed meta-RL for robust, data-efficient adaptation in partially observable and dynamic environments.
Abstract
Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.
