Table of Contents
Fetching ...

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu

TL;DR

The paper investigates cultural dominance in large language models caused by English-centric training data. It builds a benchmark with concrete (holidays, songs, etc.) and abstract (values, opinions) cultural objects and defines the In-Culture Score and cross-language distance metrics to quantify bias. Experiments show prominent English-culture dominance across GPT models, especially in non-English prompts, with text-davinci-003 least affected and GPT-4 most affected. It demonstrates two mitigation strategies—diverse multilingual pretraining and culture-aware prompting—with distinct trade-offs, underscoring the need for culture-sensitive LLM development and deployment.

Abstract

This paper identifies a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g., ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark of concrete (e.g., holidays and songs) and abstract (e.g., values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need to critically examine cultural dominance and ethical consideration in their development and deployment. We show that two straightforward methods in model development (i.e., pretraining on more diverse data) and deployment (e.g., culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

TL;DR

The paper investigates cultural dominance in large language models caused by English-centric training data. It builds a benchmark with concrete (holidays, songs, etc.) and abstract (values, opinions) cultural objects and defines the In-Culture Score and cross-language distance metrics to quantify bias. Experiments show prominent English-culture dominance across GPT models, especially in non-English prompts, with text-davinci-003 least affected and GPT-4 most affected. It demonstrates two mitigation strategies—diverse multilingual pretraining and culture-aware prompting—with distinct trade-offs, underscoring the need for culture-sensitive LLM development and deployment.

Abstract

This paper identifies a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g., ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark of concrete (e.g., holidays and songs) and abstract (e.g., values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need to critically examine cultural dominance and ethical consideration in their development and deployment. We show that two straightforward methods in model development (i.e., pretraining on more diverse data) and deployment (e.g., culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.
Paper Structure (35 sections, 1 equation, 2 figures, 32 tables)

This paper contains 35 sections, 1 equation, 2 figures, 32 tables.

Figures (2)

  • Figure 1: Analyses of the responses from ChatGPT when queried in different languages. Left: The ratio of responses related to the corresponding culture. Right: The ratio of responses related to English culture. The ChatGPT's responses for non-English queries are more related to English culture than to the corresponding culture, demonstrating a predominance of English culture in ChatGPT's outputs.
  • Figure 2: References (human results) for each survey.