Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

Louis Kwok; Michal Bravansky; Lewis D. Griffin

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

Louis Kwok, Michal Bravansky, Lewis D. Griffin

TL;DR

Problem: evaluating cultural adaptability of LLMs. Approach: replicate a multinational persuasion experiment by simulating personas with nationality cues using GPT-3.5 and compare to human data. Findings: country-of-residence prompts improve alignment, while native-language prompting can degrade fidelity; Greek/Hebrew prompts are particularly detrimental. Significance: informs prompt design for culturally aware AI and motivates broader benchmarking across models.

Abstract

The success of Large Language Models (LLMs) in multicultural environments hinges on their ability to understand users' diverse cultural backgrounds. We measure this capability by having an LLM simulate human profiles representing various nationalities within the scope of a questionnaire-style psychological experiment. Specifically, we employ GPT-3.5 to reproduce reactions to persuasive news articles of 7,286 participants from 15 countries; comparing the results with a dataset of real participants sharing the same demographic traits. Our analysis shows that specifying a person's country of residence improves GPT-3.5's alignment with their responses. In contrast, using native language prompting introduces shifts that significantly reduce overall alignment, with some languages particularly impairing performance. These findings suggest that while direct nationality information enhances the model's cultural adaptability, native language cues do not reliably improve simulation fidelity and can detract from the model's effectiveness.

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 8 figures, 4 tables)

This paper contains 22 sections, 2 equations, 8 figures, 4 tables.

Introduction
Related Work
Multilingual and Multi-Cultural Aspects of Large Language Models
Human-like Characteristics of Large Language Models
The Role of LLMs in Synthetic Behavior Simulation
Methodology
Synthesizing Personas
Experiment 1: Effect of Indicating Nationality
Experiment 2: Effect of using a Single Language to Simulate Multinational Participants
Experiment 3: Effect of using Native Languages to Simulate Multinational Participants
Prompting Procedure
Evaluation
Results
Experiment 1: Effect of Indicating Nationality
Experiment 2: Effect of using a Single Language to Simulate Multinational Participants
...and 7 more sections

Figures (8)

Figure 1: A specific human profile is defined, enriched with nationality or language, and evaluated against the ground-truth results from bos2020effects.
Figure 2: Format of a sample prompt used in the GPT-3.5 simulation. The prompt is intended to read like a semi-complete questionnaire, with the final numeric response (highlighted) provided by GPT-3.5. Key sections of the prompt are indicated by letters. a) Demographic information of the simulated participant. b) Relative deprivation ratings of the simulated participant in response to probe statements. c) The version of the news article shown to the simulated participant. In this example the anti-elite, anti-immigrant version is shown. d) The final instruction and a probe statement for GPT-3.5 to provide a single numerical response to.
Figure 3: Sign agreement rates for monolingual prompting in 12 different languages. Agreement rates significantly ($p < 0.05$) greater than chance are shown with black bars.
Figure 4: Sign agreement rates for monolingual and poly-lingual prompting. Vertical lines on bars indicate +/- 1 s.d. of variation. Bars are paler when their sign agreement is not significantly ($p < 0.05$) greater than chance.
Figure 5: News article template without populist framing.
...and 3 more figures

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

TL;DR

Abstract

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

Authors

TL;DR

Abstract

Table of Contents

Figures (8)