Table of Contents
Fetching ...

Large Language Models are Geographically Biased

Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, Stefano Ermon

TL;DR

This work introduces geographic bias as a lens to evaluate LLMs, showing that zero-shot prompts can yield highly correlated geospatial predictions with ground truth while revealing systematic regional biases, especially against areas with lower socioeconomic conditions on sensitive subjective topics. The authors develop a bias-score framework that combines rank correlation, rating dispersion, and answer rate, and demonstrate that bias varies across models and topics, with logprob-based rating statistics enabling detection of subtle biases. Extensive experiments across objective, subjective, and geographically independent topics reveal consistent regional biases and quantify their magnitude, underscoring the need for bias-aware data curation and prompting. The findings have practical implications for the deployment of LLMs in globally diverse contexts and motivate mitigation strategies to avoid perpetuating stereotypes through geospatial reasoning.

Abstract

Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's $ρ$ of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's $ρ$ of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs. Code is available on the project website: https://rohinmanvi.github.io/GeoLLM

Large Language Models are Geographically Biased

TL;DR

This work introduces geographic bias as a lens to evaluate LLMs, showing that zero-shot prompts can yield highly correlated geospatial predictions with ground truth while revealing systematic regional biases, especially against areas with lower socioeconomic conditions on sensitive subjective topics. The authors develop a bias-score framework that combines rank correlation, rating dispersion, and answer rate, and demonstrate that bias varies across models and topics, with logprob-based rating statistics enabling detection of subtle biases. Extensive experiments across objective, subjective, and geographically independent topics reveal consistent regional biases and quantify their magnitude, underscoring the need for bias-aware data curation and prompting. The findings have practical implications for the deployment of LLMs in globally diverse contexts and motivate mitigation strategies to avoid perpetuating stereotypes through geospatial reasoning.

Abstract

Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs. Code is available on the project website: https://rohinmanvi.github.io/GeoLLM
Paper Structure (38 sections, 3 equations, 8 figures, 8 tables)

This paper contains 38 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: The mean rank plots illustrate agreement across LLM predictions, with areas of green and red highlighting regions consistently rated higher or lower respectively. For objective topics, the maps demonstrate the zero-shot geographic knowledge of LLMs. The sensitive subjective topics reveal agreement that indicates strong socioeconomic biases. The geographically independent topics serve as the control.
  • Figure 1: Performance (Spearman's $\rho$) of all models on all objective topics with ground truth.
  • Figure 2: Example prompt for zero-shot geospatial predictions. It includes a GeoLLM Manvi2023GeoLLMEG prompt as well as a prefix that provides context about the task.
  • Figure 2: Correlation (Spearman's $\rho$) of ratings on sensitive subjective topics with infant survival rate (inverse of our Infant Mortality Rate topic). This demonstrates clear bias towards areas with better socioeconomic conditions. These correlations are strongest among the topics we have ground truth for, including Population Density, Nighttime Light Intensity, and Built-Up to Non Built-Up Area Ratio.
  • Figure 3: Zero-shot GPT-4 Turbo comparison with ground truth.
  • ...and 3 more figures