Table of Contents
Fetching ...

Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations

Kirti Bhagat, Kinshuk Vasisht, Danish Pruthi

TL;DR

The paper investigates geographical biases in large language models by evaluating geo-anchored tasks—travel recommendations and geo-anchored storytelling—across five models over hundreds of thousands of locations. Using Geonames-sourced locations, curated prompts, and metrics for Uniqueness, Informativeness, and Emotions, it reveals that outputs for wealthier regions are more unique and geo-rich, while poorer regions yield more hardship-focused storytelling and fewer location references. The study also links these disparities to socioeconomic indicators like GDP per capita and the frequency of country mentions in training data, finding that larger models do not automatically mitigate the biases. The findings underscore the need for geographically diverse training corpora and evaluation datasets to ensure equitable and representative model outputs with real-world applicability.

Abstract

While a large body of work inspects language models for biases concerning gender, race, occupation and religion, biases of geographical nature are relatively less explored. Some recent studies benchmark the degree to which large language models encode geospatial knowledge. However, the impact of the encoded geographical knowledge (or lack thereof) on real-world applications has not been documented. In this work, we examine large language models for two common scenarios that require geographical knowledge: (a) travel recommendations and (b) geo-anchored story generation. Specifically, we study five popular language models, and across about $100$K travel requests, and $200$K story generations, we observe that travel recommendations corresponding to poorer countries are less unique with fewer location references, and stories from these regions more often convey emotions of hardship and sadness compared to those from wealthier nations.

Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations

TL;DR

The paper investigates geographical biases in large language models by evaluating geo-anchored tasks—travel recommendations and geo-anchored storytelling—across five models over hundreds of thousands of locations. Using Geonames-sourced locations, curated prompts, and metrics for Uniqueness, Informativeness, and Emotions, it reveals that outputs for wealthier regions are more unique and geo-rich, while poorer regions yield more hardship-focused storytelling and fewer location references. The study also links these disparities to socioeconomic indicators like GDP per capita and the frequency of country mentions in training data, finding that larger models do not automatically mitigate the biases. The findings underscore the need for geographically diverse training corpora and evaluation datasets to ensure equitable and representative model outputs with real-world applicability.

Abstract

While a large body of work inspects language models for biases concerning gender, race, occupation and religion, biases of geographical nature are relatively less explored. Some recent studies benchmark the degree to which large language models encode geospatial knowledge. However, the impact of the encoded geographical knowledge (or lack thereof) on real-world applications has not been documented. In this work, we examine large language models for two common scenarios that require geographical knowledge: (a) travel recommendations and (b) geo-anchored story generation. Specifically, we study five popular language models, and across about K travel requests, and K story generations, we observe that travel recommendations corresponding to poorer countries are less unique with fewer location references, and stories from these regions more often convey emotions of hardship and sadness compared to those from wealthier nations.

Paper Structure

This paper contains 22 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: World map with country-wise analysis of responses generated by GPT-4. Left: Average count of geographical entities mentioned in generated stories (correlated with the GDP per capita with Pearson $r$ = $0.5$). Right: Uniqueness scores for travel recommendations (Pearson $r$ = $0.4$ with GDP per capita).
  • Figure 2: Percentage of stories generated by GPT-4 depicting the emotions of sadness (left, Pearson $r$ = $-0.45$) and hardship (right, Pearson $r$ = $-0.54$) for each country vs. GDP per capita.
  • Figure 3: Example prompt and response generated for a location in Lebanon. We notice few geographical references and regional artifacts mentioned (highlighted in red) in the response, leading to the low uniqueness score of 24.
  • Figure 4: Example prompt and response generated for a location in Italy. We notice many geographical references and regional artifacts mentioned (highlighted in red) in the response, leading to the high uniqueness score of 1296.