Table of Contents
Fetching ...

Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning

Kaixi Bao, Chenhao Li, Yarden As, Andreas Krause, Marco Hutter

TL;DR

This paper tackles out-of-distribution generalization in reinforcement learning by introducing memory augmentation that couples task-structured experience augmentation with a memory-based context encoder. The authors formalize the setting as a partially observable problem with a latent task context and optimize a policy across a distribution of tasks, using the objective $J(pi) = E_{T \sim p(T), \tau \sim p_{pi}(\tau|T)} [ \sum_{t=0}^{\infty} \gamma^t r(s_t, a_t) ]$; they implement task-structured augmentations and a memory module to infer context, yielding a unified policy that performs well on both ID and augmented OOD tasks. Experiments across eight legged locomotion tasks, including sim-to-real hardware tests on a quadruped, demonstrate zero-shot generalization to augmented OOD tasks while preserving ID performance and achieving higher sample efficiency than full randomization. The results highlight the practical potential of memory-informed meta-RL for robust, data-efficient adaptation in partially observable and dynamic environments.

Abstract

Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.

Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning

TL;DR

This paper tackles out-of-distribution generalization in reinforcement learning by introducing memory augmentation that couples task-structured experience augmentation with a memory-based context encoder. The authors formalize the setting as a partially observable problem with a latent task context and optimize a policy across a distribution of tasks, using the objective ; they implement task-structured augmentations and a memory module to infer context, yielding a unified policy that performs well on both ID and augmented OOD tasks. Experiments across eight legged locomotion tasks, including sim-to-real hardware tests on a quadruped, demonstrate zero-shot generalization to augmented OOD tasks while preserving ID performance and achieving higher sample efficiency than full randomization. The results highlight the practical potential of memory-informed meta-RL for robust, data-efficient adaptation in partially observable and dynamic environments.

Abstract

Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.

Paper Structure

This paper contains 22 sections, 6 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of our training framework.
  • Figure 2: Memory augmentation. Transformation $g = (g_o, g_a) \in \mathcal{G}$ is applied to observations $o_t$ and actions $a_t^g$ to generate augmented observations $o_t^g$ and actions $a_t^g$. The augmented observation sequence is forward passed through the RNN, producing hidden states $h_t^g$. During the $k$-th policy update, the initial augmented hidden state $h_0^{g,k}$ is set to the last hidden state from the previous update $h_T^{g,k-1}$, ensuring continuity and context retention across updates.
  • Figure 3: Evaluation of quadruped position tracking under joint failure.
  • Figure 4: Normalized mean episodic returns on ID and OOD tasks. Memory-Aug generalizes well to OOD tasks while preserving strong ID performance across all experiments.
  • Figure 5: PCA visualization of latent task embeddings $z$ (LF, in red) and $z_g$ (RF, LH, RH, in blue) for joint failures across different joint types (HAA, HFE and KFE). The distinct clustering of tasks suggests that the learned latent space effectively captures task-specific features.
  • ...and 4 more figures