GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Rohin Manvi; Samar Khanna; Gengchen Mai; Marshall Burke; David Lobell; Stefano Ermon

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, Stefano Ermon

TL;DR

The paper investigates whether large language models encode geospatial knowledge and introduces GeoLLM, a framework that augments prompts with OpenStreetMap data and fine-tunes models to predict geospatial variables. By combining a minimum viable geospatial prompt with map-derived context (address and nearby places) and tailored fine-tuning, GeoLLM achieves substantial improvements over traditional baselines, including a ~70% gain in Pearson's $r^2$ and competitive performance against satellite-based benchmarks. The approach demonstrates strong cross-task, global applicability and sample efficiency, with GPT-3.5 frequently offering the strongest results and scaling with model size. The work highlights the potential of LLMs to mitigate limitations of conventional covariates and to augment geospatial prediction pipelines, providing code for reproducibility.

Abstract

The application of machine learning (ML) in a range of geospatial tasks is increasingly common but often relies on globally available covariates such as satellite imagery that can either be expensive or lack predictive power. Here we explore the question of whether the vast amounts of knowledge found in Internet language corpora, now compressed within large language models (LLMs), can be leveraged for geospatial prediction tasks. We first demonstrate that LLMs embed remarkable spatial information about locations, but naively querying LLMs using geographic coordinates alone is ineffective in predicting key indicators like population density. We then present GeoLLM, a novel method that can effectively extract geospatial knowledge from LLMs with auxiliary map data from OpenStreetMap. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Across these tasks, our method demonstrates a 70% improvement in performance (measured using Pearson's $r^2$) relative to baselines that use nearest neighbors or use information directly from the prompt, and performance equal to or exceeding satellite-based benchmarks in the literature. With GeoLLM, we observe that GPT-3.5 outperforms Llama 2 and RoBERTa by 19% and 51% respectively, suggesting that the performance of our method scales well with the size of the model and its pretraining dataset. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe. Crucially, GeoLLM shows promise in mitigating the limitations of existing geospatial covariates and complementing them well. Code is available on the project website: https://rohinmanvi.github.io/GeoLLM

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

TL;DR

and competitive performance against satellite-based benchmarks. The approach demonstrates strong cross-task, global applicability and sample efficiency, with GPT-3.5 frequently offering the strongest results and scaling with model size. The work highlights the potential of LLMs to mitigate limitations of conventional covariates and to augment geospatial prediction pipelines, providing code for reproducibility.

Abstract

) relative to baselines that use nearest neighbors or use information directly from the prompt, and performance equal to or exceeding satellite-based benchmarks in the literature. With GeoLLM, we observe that GPT-3.5 outperforms Llama 2 and RoBERTa by 19% and 51% respectively, suggesting that the performance of our method scales well with the size of the model and its pretraining dataset. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe. Crucially, GeoLLM shows promise in mitigating the limitations of existing geospatial covariates and complementing them well. Code is available on the project website: https://rohinmanvi.github.io/GeoLLM

Paper Structure (20 sections, 5 figures, 5 tables)

This paper contains 20 sections, 5 figures, 5 tables.

Introduction
Related Work
Method
Minimum Viable Geospatial Prompt
Prompt with Map Data
Fine-tuning and Inference with Language Models
Experiments
Tasks and Sources
Baselines
Performance on Tasks
Ablations on the Prompt
Discussion
Conclusion
Appendix
Additional Visualizations of Performance
...and 5 more sections

Figures (5)

Figure 1: Example prompts and corresponding GPT responses. In \ref{['fig:describe_address']} we show GPT-3.5 demonstrate its geospatial knowledge by asking it to describe an address. However, in \ref{['fig:prompts']} (top) prompting GPT-3.5 with just coordinates and finetuning it on population density is insufficient. We demonstrate our prompting strategy in \ref{['fig:prompts']} (bottom) with which a finetuned GPT-3.5 is able solve the task correctly (the expected value is 9.0).
Figure 2: Plots of absolute error comparing the best baselines utilizing no pretraining and GPT-3.5 on tasks from each source with 1,000 samples. We also provide high-resolution plots from various locations around the world for the population density task from WorldPop. We show that GeoLLM not only outperforms baselines on a variety of tasks (\ref{['tab:results']}) but also demonstrates remarkable geographic consistency across the world.
Figure 3: Mean Pearson's $r^2$ for models across all tasks at 1,000 training samples
Figure 4: Learning curves for population density task from WorldPop.
Figure 5: The <latitude> and <longitude> are always rounded to 5 decimal places. The <address> is a reverse geocoded address. The <distance> is always in kilometers and is rounded to one decimal place. The <direction> is one of eight cardinal and intercardinal directions which are "North", "East", "South", "West", "North-East", "South-East", "South-West", or "North-West". The <place> is the actual name of the place. The <task> is the name of the task. In our experiments, the <task> is one of "Population Density", "Asset Wealth", "Women's Education", "Sanitation", "Women's BMI", "Mean Income", "Hispanic/Latino to Non-Hispanic/Latino Ratio", or "Home Value". The <label> is the ground truth (or completion, which is a prediction from an LLM during inference) which can be any number between 0.0 and 9.9 rounded to one decimal place.

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

TL;DR

Abstract

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)