On the Scaling Laws of Geographical Representation in Language Models
Nathan Godey, Éric de la Clergerie, Benoît Sagot
TL;DR
This work investigates how geographical knowledge embedded in hidden representations of language models evolves with model scale across diverse architectures. A linear ridge probe maps latent prompts to coordinates using the World dataset, reporting $R^2$ as the performance metric; results show geographical signals exist even in tiny models and improve with scale. Crucially, larger models exhibit stronger geographical bias tied to pretraining data, with coordinate accuracy correlating with country-name frequency, while population counts show little relation. The findings imply that scaling up LLMs can amplify data-driven geographical biases, underscoring the need for data-centric bias mitigation alongside careful consideration of pretraining corpora.
Abstract
Language models have long been shown to embed geographical information in their hidden representations. This line of work has recently been revisited by extending this result to Large Language Models (LLMs). In this paper, we propose to fill the gap between well-established and recent literature by observing how geographical knowledge evolves when scaling language models. We show that geographical knowledge is observable even for tiny models, and that it scales consistently as we increase the model size. Notably, we observe that larger language models cannot mitigate the geographical bias that is inherent to the training data.
