Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

Tidiane Camaret Ndir; André Biedenkapp; Noor Awad

Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

Tidiane Camaret Ndir, André Biedenkapp, Noor Awad

TL;DR

This work argues that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization, and proposes to integrate the learning of context representations directly with policy learning.

Abstract

In this work, we address the challenge of zero-shot generalization (ZSG) in Reinforcement Learning (RL), where agents must adapt to entirely novel environments without additional training. We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization, and we propose to integrate the learning of context representations directly with policy learning. Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings. By jointly learning policy and context, our method acquires behavior-specific context representations, enabling adaptation to unseen environments and marks progress towards reinforcement learning systems that generalize across diverse real-world tasks. Our code and experiments are available at https://github.com/tidiane-camaret/contextual_rl_zero_shot.

Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

TL;DR

Abstract

Paper Structure (18 sections, 1 equation, 11 figures, 6 tables, 2 algorithms)

This paper contains 18 sections, 1 equation, 11 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Meta-RL
Contextual RL
Background - Contextual Markov Decision Processes
Method
Inferring Context From Past Experiences
The Case for Behavior-Specific Context for Zero-Shot Generalization
Joint Context and Policy Learning
Experiments
Environments
Baseline methods
Measures of generalization
Research questions
Research Question 1:
...and 3 more sections

Figures (11)

Figure 1: Two-phase training of predictive context encoding methods. Typically, no gradient updates go through the frozen context encoder while learning the policy as depicted in (\ref{['fig:training_iida_2']}).
Figure 2: Joint training of the context encoder $\psi$ and policy/value networks $\pi_\theta$/$Q_\phi$
Figure 3: Episodic returns during training when learning with explicit context, hidden context and learned context embeddings. Our joint learning method (jcpl) is capable of reaching the same performance as if learning with the ground-truth context, outperforming the predictive identification baseline.
Figure 4: Losses of the policy model training when learning with explicit context, hidden context and learned context embeddings. Loss is consistently lower when jointly learning the context representation and the task policy.
Figure 5: Interquartile Mean (IQM) of the aggregated normalized scores, along with their respective stratified bootstrap confidence intervals, in the interpolation and extrapolation settings
...and 6 more figures

Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

TL;DR

Abstract

Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)