Table of Contents
Fetching ...

A Survey of In-Context Reinforcement Learning

Amir Moeini, Jiuqi Wang, Jacob Beck, Ethan Blaser, Shimon Whiteson, Rohan Chandra, Shangtong Zhang

TL;DR

The paper addresses how reinforcement learning agents can adapt to new tasks without gradient-based parameter updates by conditioning on task-relevant context in the forward pass. It surveys two main pretraining paradigms—supervised pretraining and reinforcement pretraining—along with test-time context construction, theoretical analyses, and architectural choices that enable robust in-context adaptation and in-context improvement. Key contributions include synthesizing post-2022 advances that demonstrate strong out-of-distribution generalization, outlining benchmarks and methods that enable test-time efficiency, and highlighting open problems and scalable architectures. The findings highlight the potential of long-context models to reduce inference computation, improve sample efficiency, and generalize across diverse tasks, while outlining significant challenges in stability, real-world deployment, and theory.

Abstract

Reinforcement learning (RL) agents typically optimize their policies by performing expensive backward passes to update their network parameters. However, some agents can solve new tasks without updating any parameters by simply conditioning on additional context such as their action-observation histories. This paper surveys work on such behavior, known as in-context reinforcement learning.

A Survey of In-Context Reinforcement Learning

TL;DR

The paper addresses how reinforcement learning agents can adapt to new tasks without gradient-based parameter updates by conditioning on task-relevant context in the forward pass. It surveys two main pretraining paradigms—supervised pretraining and reinforcement pretraining—along with test-time context construction, theoretical analyses, and architectural choices that enable robust in-context adaptation and in-context improvement. Key contributions include synthesizing post-2022 advances that demonstrate strong out-of-distribution generalization, outlining benchmarks and methods that enable test-time efficiency, and highlighting open problems and scalable architectures. The findings highlight the potential of long-context models to reduce inference computation, improve sample efficiency, and generalize across diverse tasks, while outlining significant challenges in stability, real-world deployment, and theory.

Abstract

Reinforcement learning (RL) agents typically optimize their policies by performing expensive backward passes to update their network parameters. However, some agents can solve new tasks without updating any parameters by simply conditioning on additional context such as their action-observation histories. This paper surveys work on such behavior, known as in-context reinforcement learning.

Paper Structure

This paper contains 10 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of ICRL. After pretraining, the forward pass of the network implements some RL algorithm. The implemented RL algorithm is tested on multiple MDPs. The context in each MDP can span multiple episodes.