Table of Contents
Fetching ...

Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning

Caroline Baumgartner, Eleanor Spens, Neil Burgess, Petru Manescu

TL;DR

The paper investigates how transformer-based language models develop spatial intelligence under different training regimes, contrasting exploration-driven Foraging with goal-directed SP training and a hybrid approach. It trains three GPT-2 small models on a grid-world navigation task and applies behavioral, representational, and mechanistic analyses to reveal two distinct algorithms: a cognitive map emerging in the Foraging model and a path-dependent planner in SP models. Mechanistic interventions show a Layer1 local update circuit and late-layer consolidation into a self-sufficient spatial map for Foraging, while SP models rely on continuous directional inputs; the SP-RW hybrid improves generalisation but retains path-dependence. Overall, the results suggest spatial intelligence in transformers exists on a spectrum shaped by training data, with implications for designing curricula that balance generalization and planning efficiency.

Abstract

How do large language models solve spatial navigation tasks? We investigate this by training GPT-2 models on three spatial learning paradigms in grid environments: passive exploration (Foraging Model- predicting steps in random walks), goal-directed planning (generating optimal shortest paths) on structured Hamiltonian paths (SP-Hamiltonian), and a hybrid model fine-tuned with exploratory data (SP-Random Walk). Using behavioural, representational and mechanistic analyses, we uncover two fundamentally different learned algorithms. The Foraging model develops a robust, map-like representation of space, akin to a 'cognitive map'. Causal interventions reveal that it learns to consolidate spatial information into a self-sufficient coordinate system, evidenced by a sharp phase transition where its reliance on historical direction tokens vanishes by the middle layers of the network. The model also adopts an adaptive, hierarchical reasoning system, switching between a low-level heuristic for short contexts and map-based inference for longer ones. In contrast, the goal-directed models learn a path-dependent algorithm, remaining reliant on explicit directional inputs throughout all layers. The hybrid model, despite demonstrating improved generalisation over its parent, retains the same path-dependent strategy. These findings suggest that the nature of spatial intelligence in transformers may lie on a spectrum, ranging from generalisable world models shaped by exploratory data to heuristics optimised for goal-directed tasks. We provide a mechanistic account of this generalisation-optimisation trade-off and highlight how the choice of training regime influences the strategies that emerge.

Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning

TL;DR

The paper investigates how transformer-based language models develop spatial intelligence under different training regimes, contrasting exploration-driven Foraging with goal-directed SP training and a hybrid approach. It trains three GPT-2 small models on a grid-world navigation task and applies behavioral, representational, and mechanistic analyses to reveal two distinct algorithms: a cognitive map emerging in the Foraging model and a path-dependent planner in SP models. Mechanistic interventions show a Layer1 local update circuit and late-layer consolidation into a self-sufficient spatial map for Foraging, while SP models rely on continuous directional inputs; the SP-RW hybrid improves generalisation but retains path-dependence. Overall, the results suggest spatial intelligence in transformers exists on a spectrum shaped by training data, with implications for designing curricula that balance generalization and planning efficiency.

Abstract

How do large language models solve spatial navigation tasks? We investigate this by training GPT-2 models on three spatial learning paradigms in grid environments: passive exploration (Foraging Model- predicting steps in random walks), goal-directed planning (generating optimal shortest paths) on structured Hamiltonian paths (SP-Hamiltonian), and a hybrid model fine-tuned with exploratory data (SP-Random Walk). Using behavioural, representational and mechanistic analyses, we uncover two fundamentally different learned algorithms. The Foraging model develops a robust, map-like representation of space, akin to a 'cognitive map'. Causal interventions reveal that it learns to consolidate spatial information into a self-sufficient coordinate system, evidenced by a sharp phase transition where its reliance on historical direction tokens vanishes by the middle layers of the network. The model also adopts an adaptive, hierarchical reasoning system, switching between a low-level heuristic for short contexts and map-based inference for longer ones. In contrast, the goal-directed models learn a path-dependent algorithm, remaining reliant on explicit directional inputs throughout all layers. The hybrid model, despite demonstrating improved generalisation over its parent, retains the same path-dependent strategy. These findings suggest that the nature of spatial intelligence in transformers may lie on a spectrum, ranging from generalisable world models shaped by exploratory data to heuristics optimised for goal-directed tasks. We provide a mechanistic account of this generalisation-optimisation trade-off and highlight how the choice of training regime influences the strategies that emerge.

Paper Structure

This paper contains 25 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: (a)Evolution of the Foraging Model's navigation strategy with increasing context length. Purple: single-step prediction accuracy; red: bias towards reversing last direction. The transition from high reverse bias to uniform prediction suggests a shift from local heuristics to global spatial reasoning. (b)Example tasks for both models. Top: Foraging Model training uses random walks as context (left) and predicts valid next steps (right, red arrows). Bottom: SP-H training uses Hamiltonian paths as context (left, blue arrows) and predicts shortest path between start (red) and end (green) nodes (right, multiple valid paths shown).
  • Figure 2: Left: PCA comparison of Foraging (top) and SP-RW (bottom) models. Sampled across 1,000 unique 50-step random walks on 3x3 grids, node representations are averaged over all occurrences, points are coloured by grid coordinate. Right: Horizontal Mirroring Effect in SP-Hamiltonian Model. Coordinates symmetric across the across the central horizontal axis of the 4×4 grid (0,0$\approx$0,2; 2,0$\approx$2,2) cluster together in PCA space.
  • Figure 3: Left: Layer 1 attention patterns for the final direction token in a 4-hop loop (Foraging Model).Top-right: Direction-token ablations for different loop lengths (2–12 hops, Foraging Model). Early layers (1-2) solve short loops locally, while Layers 6–8 recover global accuracy, indicating an internal spatial representation. Bottom-right: Direction-token ablation for SP models. SP models show gradual recovery, revealing continuous dependence on direction tokens and absence of spatial abstraction. Averaged over 1,000 trials; error bars = ±1 SD, shaded areas = 95 % CIs.
  • Figure 4: Performance across Manhattan Distances on 4×4 and 5×5 grids (SP Models). (A) Manhattan Distance between Start/Goal on 4×4 grids showing SP-H (purple) and SP-RW (green) performance across different context types. (B) Manhattan Distance between Start/End on 5×5 grids showing SP-Hamiltonian (blue) and SP-RW (green) generalisation performance. SP-RW shows gradual decline from 95% at MD 1 to 22% at MD 8, while SP-Hamiltonian maintains reasonable performance only for very short paths (83% at MD 1) before rapidly degrading to 0% by MD 7.
  • Figure 5: 3D PCA of Layer 12 node token hidden states (FM). Data from 1,000 random walks of length 120 on unique 4$\times$4 grids. Nodes cluster by navigational affordances: corner nodes (2 available directions, N=4), edge nodes (3 available directions, N=8), and centre nodes (4 available directions, N=4). Functional clustering replaces coordinate organisation, with nodes clustering by possible moves rather than spatial position, demonstrating action-oriented representation.
  • ...and 2 more figures