The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases
Emanuel Z. Fenech-Borg, Tilen P. Meznaric-Kos, Milica D. Lekovic-Bojovic, Arni J. Hentze-Djurhuus
TL;DR
This study formalizes the notion of a 'cultural gene' in large language models and empirically investigates how cross-corpus training shapes value-oriented behavior. Using the Cultural Probe Dataset with 200 prompts spanning $IDV$ and $PDI$, the authors compare a Western-centric model (GPT-4) and an Eastern-centric model (ERNIE Bot), finding statistically significant differences in cultural dimension scores ($p<0.001$) and distinct Cultural Alignment Index values that map to Hofstede-inspired USA and China benchmarks. Quantitative results show GPT-4 leaning toward individualism and low power distance, while ERNIE Bot shows collectivistic and higher power distance tendencies, illustrating LLMs as statistical mirrors of their training data. Qualitative analyses of dilemmas and authority judgments corroborate these findings, highlighting implications for global AI deployment, ethics, and the need for culturally aware evaluation frameworks and potentially a plurality of context-aware models to avoid algorithmic cultural hegemony.
Abstract
Large language models (LLMs) are deployed globally, yet their underlying cultural and ethical assumptions remain underexplored. We propose the notion of a "cultural gene" -- a systematic value orientation that LLMs inherit from their training corpora -- and introduce a Cultural Probe Dataset (CPD) of 200 prompts targeting two classic cross-cultural dimensions: Individualism-Collectivism (IDV) and Power Distance (PDI). Using standardized zero-shot prompts, we compare a Western-centric model (GPT-4) and an Eastern-centric model (ERNIE Bot). Human annotation shows significant and consistent divergence across both dimensions. GPT-4 exhibits individualistic and low-power-distance tendencies (IDV score approx 1.21; PDI score approx -1.05), while ERNIE Bot shows collectivistic and higher-power-distance tendencies (IDV approx -0.89; PDI approx 0.76); differences are statistically significant (p < 0.001). We further compute a Cultural Alignment Index (CAI) against Hofstede's national scores and find GPT-4 aligns more closely with the USA (e.g., IDV CAI approx 0.91; PDI CAI approx 0.88) whereas ERNIE Bot aligns more closely with China (IDV CAI approx 0.85; PDI CAI approx 0.81). Qualitative analyses of dilemma resolution and authority-related judgments illustrate how these orientations surface in reasoning. Our results support the view that LLMs function as statistical mirrors of their cultural corpora and motivate culturally aware evaluation and deployment to avoid algorithmic cultural hegemony.
