Cultural Alignment in Large Language Models Using Soft Prompt Tuning
Reem I. Masoud, Martin Ferianc, Philip Treleaven, Miguel Rodrigues
TL;DR
This work tackles the problem of aligning large language models to diverse cultural values by introducing a parameter-efficient method that freezes model weights and optimizes soft prompt embeddings using Differential Evolution to meet non-differentiable cultural objectives. The approach leverages Hofstede's VSM13 six cultural dimensions, computing a loss $L(d_i)$ via the $L1$-norm to match target cultural scores, while avoiding labeled preference data or full model fine-tuning. Experiments with Llama-3-8B-Instruct across four countries show that DE-optimized prompts significantly reduce the VSM13 loss compared to Naive and ICL baselines, though CultureBench accuracy exhibits mixed results, highlighting trade-offs between dimension alignment and practical cultural knowledge. The study demonstrates a promising, scalable path for cultural alignment in AI, with potential applications in social sciences, education, and international relations, and points to future work on multitask prompting and alternative black-box optimizers.
Abstract
Large Language Model (LLM) alignment conventionally relies on supervised fine-tuning or reinforcement learning based alignment frameworks. These methods typically require labeled or preference datasets and involve updating model weights to align the LLM with the training objective or reward model. Meanwhile, in social sciences such as cross-cultural studies, factor analysis is widely used to uncover underlying dimensions or latent variables that explain observed patterns in survey data. The non-differentiable nature of these measurements deriving from survey data renders the former alignment methods infeasible for alignment with cultural dimensions. To overcome this, we propose a parameter efficient strategy that combines soft prompt tuning, which freezes the model parameters while modifying the input prompt embeddings, with Differential Evolution (DE), a black-box optimization method for cases where a differentiable objective is unattainable. This strategy ensures alignment consistency without the need for preference data or model parameter updates, significantly enhancing efficiency and mitigating overfitting. Our method demonstrates significant improvements in LLama-3-8B-Instruct's cultural dimensions across multiple regions, outperforming both the Naive LLM and the In-context Learning (ICL) baseline, and effectively bridges computational models with human cultural nuances.
