From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers
Ching Fang, Kanaka Rajan
TL;DR
This work investigates how episodic memory enables rapid in-context reinforcement learning in transformers by training decision-pretrained transformers on gridworld and tree-maze planning tasks. It shows that rapid adaptation arises from memory-based computations stored in context-memory tokens, not from traditional model-free or model-based strategies, and that representations exhibit in-context structure learning with cross-context alignment. Through rigorous attribution and decoding analyses, the authors demonstrate that decisions rely on cached intermediate computations rather than full path planning, and that memory tokens encode coordinates and path information in a hippocampal-like manner. The findings propose memory as a computational workspace supporting flexible, rapid adaptation in novel environments, with implications for understanding natural cognition and guiding future memory-augmented architectures.
Abstract
Humans and animals show remarkable learning efficiency, adapting to new environments with minimal experience. This capability is not well captured by standard reinforcement learning algorithms that rely on incremental value updates. Rapid adaptation likely depends on episodic memory -- the ability to retrieve specific past experiences to guide decisions in novel contexts. Transformers provide a useful setting for studying these questions because of their ability to learn rapidly in-context and because their key-value architecture resembles episodic memory systems in the brain. We train a transformer to in-context reinforcement learn in a distribution of planning tasks inspired by rodent behavior. We then characterize the learning algorithms that emerge in the model. We first find that representation learning is supported by in-context structure learning and cross-context alignment, where representations are aligned across environments with different sensory stimuli. We next demonstrate that the reinforcement learning strategies developed by the model are not interpretable as standard model-free or model-based planning. Instead, we show that in-context reinforcement learning is supported by caching intermediate computations within the model's memory tokens, which are then accessed at decision time. Overall, we find that memory may serve as a computational resource, storing both raw experience and cached computations to support flexible behavior. Furthermore, the representations developed in the model resemble computations associated with the hippocampal-entorhinal system in the brain, suggesting that our findings may be relevant for natural cognition. Taken together, our work offers a mechanistic hypothesis for the rapid adaptation that underlies in-context learning in artificial and natural settings.
