Table of Contents
Fetching ...

Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4

Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi

TL;DR

This study treats GPT-4 as a geographic knowledge base and introduces a natural language geo-guessing test using English DBpedia abstracts to evaluate how well geographic feature types are represented. It compares unimodal and multimodal GPT-4 variants under zero-shot prompts, revealing global and regional gaps in encoded geographic knowledge, notably for World Heritage Sites and other feature types. The results show inter-model and inter-regional disparities and suggest retrieval-augmented generation and broader ground-truth corpora as avenues to improve geographic knowledge in foundation models. The work highlights geographic diversity as an ethical principle in GIScience and calls for richer probing and knowledge-grounding strategies in AI systems.

Abstract

Generative AI based on foundation models provides a first glimpse into the world represented by machines trained on vast amounts of multimodal data ingested by these models during training. If we consider the resulting models as knowledge bases in their own right, this may open up new avenues for understanding places through the lens of machines. In this work, we adopt this thinking and select GPT-4, a state-of-the-art representative in the family of multimodal large language models, to study its geographic diversity regarding how well geographic features are represented. Using DBpedia abstracts as a ground-truth corpus for probing, our natural language--based geo-guessing experiment shows that GPT-4 may currently encode insufficient knowledge about several geographic feature types on a global level. On a local level, we observe not only this insufficiency but also inter-regional disparities in GPT-4's geo-guessing performance on UNESCO World Heritage Sites that carry significance to both local and global populations, and the inter-regional disparities may become smaller as the geographic scale increases. Morever, whether assessing the geo-guessing performance on a global or local level, we find inter-model disparities in GPT-4's geo-guessing performance when comparing its unimodal and multimodal variants. We hope this work can initiate a discussion on geographic diversity as an ethical principle within the GIScience community in the face of global socio-technical challenges.

Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4

TL;DR

This study treats GPT-4 as a geographic knowledge base and introduces a natural language geo-guessing test using English DBpedia abstracts to evaluate how well geographic feature types are represented. It compares unimodal and multimodal GPT-4 variants under zero-shot prompts, revealing global and regional gaps in encoded geographic knowledge, notably for World Heritage Sites and other feature types. The results show inter-model and inter-regional disparities and suggest retrieval-augmented generation and broader ground-truth corpora as avenues to improve geographic knowledge in foundation models. The work highlights geographic diversity as an ethical principle in GIScience and calls for richer probing and knowledge-grounding strategies in AI systems.

Abstract

Generative AI based on foundation models provides a first glimpse into the world represented by machines trained on vast amounts of multimodal data ingested by these models during training. If we consider the resulting models as knowledge bases in their own right, this may open up new avenues for understanding places through the lens of machines. In this work, we adopt this thinking and select GPT-4, a state-of-the-art representative in the family of multimodal large language models, to study its geographic diversity regarding how well geographic features are represented. Using DBpedia abstracts as a ground-truth corpus for probing, our natural language--based geo-guessing experiment shows that GPT-4 may currently encode insufficient knowledge about several geographic feature types on a global level. On a local level, we observe not only this insufficiency but also inter-regional disparities in GPT-4's geo-guessing performance on UNESCO World Heritage Sites that carry significance to both local and global populations, and the inter-regional disparities may become smaller as the geographic scale increases. Morever, whether assessing the geo-guessing performance on a global or local level, we find inter-model disparities in GPT-4's geo-guessing performance when comparing its unimodal and multimodal variants. We hope this work can initiate a discussion on geographic diversity as an ethical principle within the GIScience community in the face of global socio-technical challenges.
Paper Structure (9 sections, 3 figures, 3 tables)

This paper contains 9 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The retrieval process of a dbo:Sea feature dbr:Mediterranean_Sea and its abstract from DBpedia
  • Figure 2: An example geo-guessing experiment about a dbo:Bay feature dbr:Gulf_of_Thailand, implemented with the Chat mode in OpenAI Playground
  • Figure 3: The hierarchy of DBpedia's dbo:Place subclasses used in our work-in-progress