Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning
Caroline Baumgartner, Eleanor Spens, Neil Burgess, Petru Manescu
TL;DR
The paper investigates how transformer-based language models develop spatial intelligence under different training regimes, contrasting exploration-driven Foraging with goal-directed SP training and a hybrid approach. It trains three GPT-2 small models on a grid-world navigation task and applies behavioral, representational, and mechanistic analyses to reveal two distinct algorithms: a cognitive map emerging in the Foraging model and a path-dependent planner in SP models. Mechanistic interventions show a Layer1 local update circuit and late-layer consolidation into a self-sufficient spatial map for Foraging, while SP models rely on continuous directional inputs; the SP-RW hybrid improves generalisation but retains path-dependence. Overall, the results suggest spatial intelligence in transformers exists on a spectrum shaped by training data, with implications for designing curricula that balance generalization and planning efficiency.
Abstract
How do large language models solve spatial navigation tasks? We investigate this by training GPT-2 models on three spatial learning paradigms in grid environments: passive exploration (Foraging Model- predicting steps in random walks), goal-directed planning (generating optimal shortest paths) on structured Hamiltonian paths (SP-Hamiltonian), and a hybrid model fine-tuned with exploratory data (SP-Random Walk). Using behavioural, representational and mechanistic analyses, we uncover two fundamentally different learned algorithms. The Foraging model develops a robust, map-like representation of space, akin to a 'cognitive map'. Causal interventions reveal that it learns to consolidate spatial information into a self-sufficient coordinate system, evidenced by a sharp phase transition where its reliance on historical direction tokens vanishes by the middle layers of the network. The model also adopts an adaptive, hierarchical reasoning system, switching between a low-level heuristic for short contexts and map-based inference for longer ones. In contrast, the goal-directed models learn a path-dependent algorithm, remaining reliant on explicit directional inputs throughout all layers. The hybrid model, despite demonstrating improved generalisation over its parent, retains the same path-dependent strategy. These findings suggest that the nature of spatial intelligence in transformers may lie on a spectrum, ranging from generalisable world models shaped by exploratory data to heuristics optimised for goal-directed tasks. We provide a mechanistic account of this generalisation-optimisation trade-off and highlight how the choice of training regime influences the strategies that emerge.
