Prompt Programming for Cultural Bias and Alignment of Large Language Models

Maksim Eren; Eric Michalak; Brian Cook; Johnny Seales

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales

Abstract

Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as summarization, categorization, and compliance-oriented auditing, improving cultural alignment is important for ensuring that downstream analyses and recommendations reflect target-population value profiles rather than default model priors. Previous work introduced a survey-grounded cultural alignment framework and showed that culture-specific prompting can reduce misalignment, but it primarily evaluated proprietary models and relied on manual prompt engineering. In this paper, we validate and extend that framework by reproducing its social sciences survey based projection and distance metrics on open-weight LLMs, testing whether the same cultural skew and benefits of culture conditioning persist outside closed LLM systems. Building on this foundation, we introduce use of prompt programming with DSPy for this problem-treating prompts as modular, optimizable programs-to systematically tune cultural conditioning by optimizing against cultural-distance objectives. In our experiments, we show that prompt optimization often improves upon cultural prompt engineering, suggesting prompt compilation with DSPy can provide a more stable and transferable route to culturally aligned LLM responses.

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Abstract

Paper Structure (10 sections, 12 equations, 3 figures)

This paper contains 10 sections, 12 equations, 3 figures.

Introduction
Related Work
Methods
Cultural map of countries/territories
Open-source model projection into the IVS benchmark space
Country-level cultural distance under three prompting regimes
Culture prompt programming with DSPy
Results
Discussion and Future Work
Conclusion

Figures (3)

Figure 1: Cultural map of countries/territories in the IVS benchmark space (Survival vs. Self-Expression; Traditional vs. Secular values). We overlay points derived from open-weight model responses (Llama 3.3 70B, Llama 4 16x17B, Gemma 3 27B, GPT-OSS 20B, GPT-OSS 120B) to the same IVS items, following the projection procedure of Tao et al. pgae346. Colored area around each point is used as visual purposes of highlighting clustering of each category.
Figure 2: Country-level cultural distance for open-source LLMs under three prompting regimes: (i) without culture conditioning, (ii) with manual culture prompt engineering, and (iii) culture prompt programming with DSPy. Cultural distance is computed as Euclidean distance in the IVS cultural map space, consistent with Tao et al. pgae346. Cultural distance is measured, for each country/territory $c$, as the Euclidean distance between the model's projected coordinate in the IVS cultural-map space (Figure \ref{['fig:cultural_map_opensource']}) and the human reference coordinate $\boldsymbol{\nu}^{\text{IVS}}_{c}$. We report distances under three regimes: a generic (non-national, without culture prompt engineering) model point compared to all countries, country-conditioned prompting using a fixed manual prefix, and country-conditioned prompting using a DSPy-compiled prompt program. Smaller distances indicate closer alignment to the country/territory benchmark, and changes across regimes reflect how cultural conditioning and prompt programming shift the model toward $\boldsymbol{\nu}^{\text{IVS}}_{c}$.
Figure 3: Per-country movement in the Inglehart--Welzel IVS benchmark space (PC1$'$/PC2$'$; Survival vs. Self-Expression and Traditional vs. Secular). Each mini-panel corresponds to one country/territory $c$ (grouped by cultural zone; background tint) and compares the generic, non-national gpt-oss:120b projection $\boldsymbol{\mu}_{m,\varnothing}$ (Generic) to the aligned projection $\boldsymbol{\mu}^{\mathrm{DSPy}}_{m,c}$ (Aligned), obtained via Culture Prompt Programming with DSPy (MIPROv2; proposer gpt-oss:120b), relative to the human reference point $\boldsymbol{\nu}^{\mathrm{IVS}}_{c}$ (Human). The arrow indicates the shift from $\boldsymbol{\mu}_{m,\varnothing}$ to $\boldsymbol{\mu}^{\mathrm{DSPy}}_{m,c}$, and the dashed segment indicates the remaining discrepancy $\lVert \boldsymbol{\mu}^{\mathrm{DSPy}}_{m,c}-\boldsymbol{\nu}^{\mathrm{IVS}}_{c}\rVert_2$. Each panel reports $\Delta(c)=\lVert \boldsymbol{\mu}_{m,\varnothing}-\boldsymbol{\nu}^{\mathrm{IVS}}_{c}\rVert_2-\lVert \boldsymbol{\mu}^{\mathrm{DSPy}}_{m,c}-\boldsymbol{\nu}^{\mathrm{IVS}}_{c}\rVert_2$, where $\Delta(c)>0$ indicates that DSPy alignment moves the model closer to the human benchmark for $c$.

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Abstract

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Authors

Abstract

Table of Contents

Figures (3)