A Survey of In-Context Reinforcement Learning
Amir Moeini, Jiuqi Wang, Jacob Beck, Ethan Blaser, Shimon Whiteson, Rohan Chandra, Shangtong Zhang
TL;DR
The paper addresses how reinforcement learning agents can adapt to new tasks without gradient-based parameter updates by conditioning on task-relevant context in the forward pass. It surveys two main pretraining paradigms—supervised pretraining and reinforcement pretraining—along with test-time context construction, theoretical analyses, and architectural choices that enable robust in-context adaptation and in-context improvement. Key contributions include synthesizing post-2022 advances that demonstrate strong out-of-distribution generalization, outlining benchmarks and methods that enable test-time efficiency, and highlighting open problems and scalable architectures. The findings highlight the potential of long-context models to reduce inference computation, improve sample efficiency, and generalize across diverse tasks, while outlining significant challenges in stability, real-world deployment, and theory.
Abstract
Reinforcement learning (RL) agents typically optimize their policies by performing expensive backward passes to update their network parameters. However, some agents can solve new tasks without updating any parameters by simply conditioning on additional context such as their action-observation histories. This paper surveys work on such behavior, known as in-context reinforcement learning.
