An Evaluation of Cultural Value Alignment in LLM

Nicholas Sukiennik; Chen Gao; Fengli Xu; Yong Li

An Evaluation of Cultural Value Alignment in LLM

Nicholas Sukiennik, Chen Gao, Fengli Xu, Yong Li

TL;DR

This paper conducts a large-scale evaluation of cultural value alignment in LLMs across 20 countries and 10 models using Hofstede's Values Survey Module (VSM). It introduces the Deviation Ratio to account for a global-average cultural skew and analyzes the effects of prompt language, model origin, and external factors such as GDP and web-content share. Key findings include a moderate global-average bias in outputs, the United States being the best-aligned country, and GLM-4 emerging as the top-aligning model; language prompts and data availability strongly influence alignment. The work highlights implications for culturally adaptive alignment, cautions about potential cross-cultural biases, and suggests directions for richer, more globally representative training data. These insights inform practical strategies for producing culturally considerate LLM outputs and identify avenues for future cross-cultural benchmarking.

Abstract

LLMs as intelligent agents are being increasingly applied in scenarios where human interactions are involved, leading to a critical concern about whether LLMs are faithful to the variations in culture across regions. Several works have investigated this question in various ways, finding that there are biases present in the cultural representations of LLM outputs. To gain a more comprehensive view, in this work, we conduct the first large-scale evaluation of LLM culture assessing 20 countries' cultures and languages across ten LLMs. With a renowned cultural values questionnaire and by carefully analyzing LLM output with human ground truth scores, we thoroughly study LLMs' cultural alignment across countries and among individual models. Our findings show that the output over all models represents a moderate cultural middle ground. Given the overall skew, we propose an alignment metric, revealing that the United States is the best-aligned country and GLM-4 has the best ability to align to cultural values. Deeper investigation sheds light on the influence of model origin, prompt language, and value dimensions on cultural output. Specifically, models, regardless of where they originate, align better with the US than they do with China. The conclusions provide insight to how LLMs can be better aligned to various cultures as well as provoke further discussion of the potential for LLMs to propagate cultural bias and the need for more culturally adaptable models.

An Evaluation of Cultural Value Alignment in LLM

TL;DR

Abstract

An Evaluation of Cultural Value Alignment in LLM

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)