Table of Contents
Fetching ...

Can LLMs Learn to Map the World from Local Descriptions?

Sirui Xia, Aili Chen, Xintao Wang, Tinghui Zhu, Yikai Zhang, Jiangjie Chen, Yanghua Xiao

TL;DR

This work investigates whether large language models can construct global spatial cognition from locally relative descriptions and trajectory data in a synthetic urban setting. Using a two-stage continual pre-training pipeline on relational and trajectory data, the authors evaluate spatial perception, latent geometry, and navigation through explicit predictions, latent probes, and perturbation robustness. They find that LLMs generalize to unseen POI relationships, encode coordinates and geometry in latent space, and can plan routes between unconnected POIs, though robustness to path perturbations remains limited and data distribution strongly shapes resilience. The results suggest that language-based models can autonomously develop structured spatial representations, offering a foundation for spatial reasoning and navigation without explicit coordinates or visual input, while highlighting current limits and the role of data distribution. These insights advance our understanding of grounding LLMs for spatial cognition and inform future work on robust, distributed spatial planning with language-driven agents.

Abstract

Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigates whether LLMs, grounded in locally relative human observations, can construct coherent global spatial cognition by integrating fragmented relational descriptions. We focus on two core aspects of spatial cognition: spatial perception, where models infer consistent global layouts from local positional relationships, and spatial navigation, where models learn road connectivity from trajectory data and plan optimal paths between unconnected locations. Experiments conducted in a simulated urban environment demonstrate that LLMs not only generalize to unseen spatial relationships between points of interest (POIs) but also exhibit latent representations aligned with real-world spatial distributions. Furthermore, LLMs can learn road connectivity from trajectory descriptions, enabling accurate path planning and dynamic spatial awareness during navigation.

Can LLMs Learn to Map the World from Local Descriptions?

TL;DR

This work investigates whether large language models can construct global spatial cognition from locally relative descriptions and trajectory data in a synthetic urban setting. Using a two-stage continual pre-training pipeline on relational and trajectory data, the authors evaluate spatial perception, latent geometry, and navigation through explicit predictions, latent probes, and perturbation robustness. They find that LLMs generalize to unseen POI relationships, encode coordinates and geometry in latent space, and can plan routes between unconnected POIs, though robustness to path perturbations remains limited and data distribution strongly shapes resilience. The results suggest that language-based models can autonomously develop structured spatial representations, offering a foundation for spatial reasoning and navigation without explicit coordinates or visual input, while highlighting current limits and the role of data distribution. These insights advance our understanding of grounding LLMs for spatial cognition and inform future work on robust, distributed spatial planning with language-driven agents.

Abstract

Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigates whether LLMs, grounded in locally relative human observations, can construct coherent global spatial cognition by integrating fragmented relational descriptions. We focus on two core aspects of spatial cognition: spatial perception, where models infer consistent global layouts from local positional relationships, and spatial navigation, where models learn road connectivity from trajectory data and plan optimal paths between unconnected locations. Experiments conducted in a simulated urban environment demonstrate that LLMs not only generalize to unseen spatial relationships between points of interest (POIs) but also exhibit latent representations aligned with real-world spatial distributions. Furthermore, LLMs can learn road connectivity from trajectory descriptions, enabling accurate path planning and dynamic spatial awareness during navigation.

Paper Structure

This paper contains 66 sections, 11 figures, 23 tables.

Figures (11)

  • Figure 1: Summary of our research framework. Firsr, we construct a simulated environment and generate training data capturing relative spatial relations and shortest paths. Then, we apply continual pre-training to the LLM and evaluate its spatial cognition through explicit prediction tasks and latent representation analysis.
  • Figure 2: Consistency between POI latent representations and actual spatial locations. Spearman and Pearson correlation coefficients quantify monotonic and linear relationships, respectively.
  • Figure 3: Latent spatial composition evaluation. An MLP predicts distance and azimuth between POI pairs using their concatenated hidden states. We use MAE to measure the deviation between the predicted and true values, and use R² and Spearman correlation to assess the consistency.
  • Figure 4: Heatmap of turning point frequencies. The left side shows the training data statistics, while the right side shows the test data statistics.
  • Figure 5: The model's performance under different frequency thresholds.
  • ...and 6 more figures