Table of Contents
Fetching ...

From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs

Jiyuan An, Liner Yang, Mengyan Wang, Luming Lu, Weihua An, Erhong Yang

Abstract

As spatial intelligence becomes an increasingly important capability for foundation models, it remains unclear whether large language models' (LLMs) performance on spatial reasoning benchmarks reflects structured internal spatial representations or reliance on linguistic heuristics. We address this question from a mechanistic perspective by examining how spatial information is internally represented and used. Drawing on computational theories of human spatial cognition, we decompose spatial reasoning into three primitives, relational composition, representational transformation, and stateful spatial updating, and design controlled task families for each. We evaluate multilingual LLMs in English, Chinese, and Arabic under single pass inference, and analyze internal representations using linear probing, sparse autoencoder based feature analysis, and causal interventions. We find that task relevant spatial information is encoded in intermediate layers and can causally influence behavior, but these representations are transient, fragmented across task families, and weakly integrated into final predictions. Cross linguistic analysis further reveals mechanistic degeneracy, where similar behavioral performance arises from distinct internal pathways. Overall, our results suggest that current LLMs exhibit limited and context dependent spatial representations rather than robust, general purpose spatial reasoning, highlighting the need for mechanistic evaluation beyond benchmark accuracy.

From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs

Abstract

As spatial intelligence becomes an increasingly important capability for foundation models, it remains unclear whether large language models' (LLMs) performance on spatial reasoning benchmarks reflects structured internal spatial representations or reliance on linguistic heuristics. We address this question from a mechanistic perspective by examining how spatial information is internally represented and used. Drawing on computational theories of human spatial cognition, we decompose spatial reasoning into three primitives, relational composition, representational transformation, and stateful spatial updating, and design controlled task families for each. We evaluate multilingual LLMs in English, Chinese, and Arabic under single pass inference, and analyze internal representations using linear probing, sparse autoencoder based feature analysis, and causal interventions. We find that task relevant spatial information is encoded in intermediate layers and can causally influence behavior, but these representations are transient, fragmented across task families, and weakly integrated into final predictions. Cross linguistic analysis further reveals mechanistic degeneracy, where similar behavioral performance arises from distinct internal pathways. Overall, our results suggest that current LLMs exhibit limited and context dependent spatial representations rather than robust, general purpose spatial reasoning, highlighting the need for mechanistic evaluation beyond benchmark accuracy.

Paper Structure

This paper contains 62 sections, 6 figures, 4 tables, 3 algorithms.

Figures (6)

  • Figure 1: From human spatial cognition to spatial representations in large language models.
  • Figure 2: Illustration of the three spatial task families proposed in this work, with examples shown in English for clarity.
  • Figure 3: Overview of the proposed framework. We probe intermediate representations of a neural network, extract interpretable features using a sparse autoencoder, and analyze their roles via gradient-based attribution and causal interventions.
  • Figure 4: Layer-wise R² scores for spatial variable prediction across three task families (Qwen2.5-7B-Instruct, English). All tasks show mid-layer peaks followed by sharp declines in final layers. Task Family 1 and 3 demonstrate strong representational clarity (R² up to 0.37 and 0.40 respectively), while Task Family 2 shows minimal spatial encoding.
  • Figure 5: Feature importance versus activation frequency for Task Family 2 (Qwen2.5-7B-Istruct, Chinese). Important features are sparse and not aligned with activation frequency, indicating dissociation between usage and causal contribution.
  • ...and 1 more figures