Investigating Cultural Alignment of Large Language Models

Badr AlKhamissi; Muhammad ElNokrashy; Mai AlKhamissi; Mona Diab

Investigating Cultural Alignment of Large Language Models

Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab

TL;DR

This work introduces a framework to quantify cultural alignment in large language models by simulating sociological surveys (World Values Survey) across two cultures (Egypt and the United States) and measuring how prompting language, pretraining data composition, personas, and topics affect alignment. It defines hard and soft metrics to compare model outputs with ground-truth survey responses and proposes Anthropological Prompting to improve emic/etic grounded reasoning. Key findings reveal an Anglocentric bias, the influence of dominant-language prompting, and demographic gaps that underrepresent digitally marginalized groups, with anthropological prompting showing notable improvements for underrepresented personas. The study highlights the need for balanced multilingual pretraining and careful cross-lingual transfer design to better represent diverse human experiences and culture in LLM outputs.

Abstract

The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology. Large Language Models (LLMs), promoted as repositories of collective human knowledge, raise a pivotal question: do these models genuinely encapsulate the diverse knowledge adopted by different cultures? Our study reveals that these models demonstrate greater cultural alignment along two dimensions -- firstly, when prompted with the dominant language of a specific culture, and secondly, when pretrained with a refined mixture of languages employed by that culture. We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references. Specifically, we replicate a survey conducted in various regions of Egypt and the United States through prompting LLMs with different pretraining data mixtures in both Arabic and English with the personas of the real respondents and the survey questions. Further analysis reveals that misalignment becomes more pronounced for underrepresented personas and for culturally sensitive topics, such as those probing social values. Finally, we introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment. Our study emphasizes the necessity for a more balanced multilingual pretraining dataset to better represent the diversity of human experience and the plurality of different cultures with many implications on the topic of cross-lingual transfer.

Investigating Cultural Alignment of Large Language Models

TL;DR

Abstract

Paper Structure (43 sections, 2 equations, 14 figures, 17 tables)

This paper contains 43 sections, 2 equations, 14 figures, 17 tables.

Introduction
Research Questions
Prompting Language and Cultural Alignment:
Pretraining Data Composition:
Personas and Cultural Topics:
Finetuning Models to Induce Cross-Lingual Knowledge Transfer:
Anthropological Preliminaries
Working Assumptions
Language $\rightarrow$ Culture
Culture $\rightarrow$ Language
Experimental Setup
World Values Survey (WVS)
Survey Participants
Filtering Participants
Personas: Role-Playing for LLMs
...and 28 more sections

Figures (14)

Figure 1: Our framework for measuring the cultural alignment of LLM knowledge/output and ground-truth cultural data collected through survey responses.
Figure 2: Template used when querying models in English. (Left) The model is first instructed to respond under a specific persona along the demographic parameters highlighted in red. (Right) The rest of the prompt instructs the model to follow the perspective of the persona closely, respond in a specific format (only the index of the answer), and avoid any extraneous commentary.
Figure 3: Cultural alignment as a function of a subject's Sex, Education Level, Social Class, and Age Range. Results are averaged across the models, prompting languages and surveys used in this work. L-Middle and U-Middle are Lower Middle and Upper Middle Class respectively.
Figure 4: --- Arabic--- English. Alignment of GPT-3.5 with the Egypt survey using both the soft and hard metrics by theme as a function of the prompting language.
Figure 5: Anthropological prompting improves alignment for underrepresented personas compared to Vanilla prompting. Results on GPT-3.5 using English prompting. More in \ref{['app:anthro-prompt']}.
...and 9 more figures

Investigating Cultural Alignment of Large Language Models

TL;DR

Abstract

Investigating Cultural Alignment of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)