Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie
TL;DR
This work evaluates the geographic and geospatial reasoning capabilities of multimodal LLMs, focusing on GPT-4V and several open-source baselines, through a curated, small-scale benchmark. It combines localisation, remote sensing interpretation, mapping, and flag-identification tasks across natural, abstract, and RS imagery to map strengths, weaknesses, and biases. Key findings show GPT-4V attains broad task coverage and strong sentence-level reasoning but struggles with precise localization and object-level delineation, while open-source models often excel at localization and certain RS tasks. The authors release their benchmark to enable reproducibility and cross-model comparisons, highlighting practical implications for navigation, environmental monitoring, disaster response, and awareness of regional biases in training data.
Abstract
Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of their capabilities in the geographic domain. To enable the comparison and evaluation of future models, our benchmark will be publicly released.
