Can LLMs Learn to Map the World from Local Descriptions?
Sirui Xia, Aili Chen, Xintao Wang, Tinghui Zhu, Yikai Zhang, Jiangjie Chen, Yanghua Xiao
TL;DR
This work investigates whether large language models can construct global spatial cognition from locally relative descriptions and trajectory data in a synthetic urban setting. Using a two-stage continual pre-training pipeline on relational and trajectory data, the authors evaluate spatial perception, latent geometry, and navigation through explicit predictions, latent probes, and perturbation robustness. They find that LLMs generalize to unseen POI relationships, encode coordinates and geometry in latent space, and can plan routes between unconnected POIs, though robustness to path perturbations remains limited and data distribution strongly shapes resilience. The results suggest that language-based models can autonomously develop structured spatial representations, offering a foundation for spatial reasoning and navigation without explicit coordinates or visual input, while highlighting current limits and the role of data distribution. These insights advance our understanding of grounding LLMs for spatial cognition and inform future work on robust, distributed spatial planning with language-driven agents.
Abstract
Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigates whether LLMs, grounded in locally relative human observations, can construct coherent global spatial cognition by integrating fragmented relational descriptions. We focus on two core aspects of spatial cognition: spatial perception, where models infer consistent global layouts from local positional relationships, and spatial navigation, where models learn road connectivity from trajectory data and plan optimal paths between unconnected locations. Experiments conducted in a simulated urban environment demonstrate that LLMs not only generalize to unseen spatial relationships between points of interest (POIs) but also exhibit latent representations aligned with real-world spatial distributions. Furthermore, LLMs can learn road connectivity from trajectory descriptions, enabling accurate path planning and dynamic spatial awareness during navigation.
