ChatGPT as a mapping assistant: A novel method to enrich maps with generative AI and content derived from street-level photographs
Levente Juhász, Peter Mooney, Hartwig H. Hochmair, Boyuan Guan
TL;DR
This paper investigates using generative AI to enrich maps by turning street-level image descriptions into OpenStreetMap road tags, integrating VGI sources (OSM and Mapillary) with off-the-shelf AI tools. The authors combine GPT-3.5-turbo and BLIP-2 in a Miami-area study, evaluating four prompt scenarios (Baseline, Locational context, Object detection, and OD+LC) against human analysts across 94 road segments. Results show that richer image descriptions and additional context substantially improve tagging accuracy, with OD+LC achieving the largest gains (up to ~19.5 percentage points) and semantic road-category tagging reaching up to ~60% accuracy in some configurations. The study highlights the potential and limitations of AI-assisted mapping, suggesting that larger-scale studies and multimodal spatial AI systems could advance practical GIS workflows while mitigating issues like hallucinations and tagging heterogeneity.
Abstract
This paper explores the concept of leveraging generative AI as a mapping assistant for enhancing the efficiency of collaborative mapping. We present results of an experiment that combines multiple sources of volunteered geographic information (VGI) and large language models (LLMs). Three analysts described the content of crowdsourced Mapillary street-level photographs taken along roads in a small test area in Miami, Florida. GPT-3.5-turbo was instructed to suggest the most appropriate tagging for each road in OpenStreetMap (OSM). The study also explores the utilization of BLIP-2, a state-of-the-art multimodal pre-training method as an artificial analyst of street-level photographs in addition to human analysts. Results demonstrate two ways to effectively increase the accuracy of mapping suggestions without modifying the underlying AI models: by (1) providing a more detailed description of source photographs, and (2) combining prompt engineering with additional context (e.g. location and objects detected along a road). The first approach increases the suggestion accuracy by up to 29%, and the second one by up to 20%.
