Table of Contents
Fetching ...

Localized Cultural Knowledge is Conserved and Controllable in Large Language Models

Veniamin Veselovsky, Berke Argin, Benedikt Stroebl, Chris Wendler, Robert West, James Evans, Thomas L. Griffiths, Arvind Narayanan

TL;DR

This work investigates how large language models encode local cultural knowledge that can be surfaced or suppressed depending on prompting. It introduces the explicit-implicit localization gap (EI-Gap) to quantify differences between explicit cultural prompting and implicit, language-driven localization, showing that explicit prompts improve cultural localization but reduce diversity and increase stereotypes. Through activation patching and contrastive activation steering, the authors identify mid-layer circuits that mediate localization and demonstrate steering vectors that align outputs with target cultures across languages and tasks, reducing bias compared to explicit prompting. The findings suggest a universal, language-agnostic mechanism for cultural localization and offer practical avenues for culturally aware translation and customization, as well as avenues for further soft control in LLMs.

Abstract

Just as humans display language patterns influenced by their native tongue when speaking new languages, LLMs often default to English-centric responses even when generating in other languages. Nevertheless, we observe that local cultural information persists within the models and can be readily activated for cultural customization. We first demonstrate that explicitly providing cultural context in prompts significantly improves the models' ability to generate culturally localized responses. We term the disparity in model performance with versus without explicit cultural context the explicit-implicit localization gap, indicating that while cultural knowledge exists within LLMs, it may not naturally surface in multilingual interactions if cultural context is not explicitly provided. Despite the explicit prompting benefit, however, the answers reduce in diversity and tend toward stereotypes. Second, we identify an explicit cultural customization vector, conserved across all non-English languages we explore, which enables LLMs to be steered from the synthetic English cultural world-model toward each non-English cultural world. Steered responses retain the diversity of implicit prompting and reduce stereotypes to dramatically improve the potential for customization. We discuss the implications of explicit cultural customization for understanding the conservation of alternative cultural world models within LLMs, and their controllable utility for translation, cultural customization, and the possibility of making the explicit implicit through soft control for expanded LLM function and appeal.

Localized Cultural Knowledge is Conserved and Controllable in Large Language Models

TL;DR

This work investigates how large language models encode local cultural knowledge that can be surfaced or suppressed depending on prompting. It introduces the explicit-implicit localization gap (EI-Gap) to quantify differences between explicit cultural prompting and implicit, language-driven localization, showing that explicit prompts improve cultural localization but reduce diversity and increase stereotypes. Through activation patching and contrastive activation steering, the authors identify mid-layer circuits that mediate localization and demonstrate steering vectors that align outputs with target cultures across languages and tasks, reducing bias compared to explicit prompting. The findings suggest a universal, language-agnostic mechanism for cultural localization and offer practical avenues for culturally aware translation and customization, as well as avenues for further soft control in LLMs.

Abstract

Just as humans display language patterns influenced by their native tongue when speaking new languages, LLMs often default to English-centric responses even when generating in other languages. Nevertheless, we observe that local cultural information persists within the models and can be readily activated for cultural customization. We first demonstrate that explicitly providing cultural context in prompts significantly improves the models' ability to generate culturally localized responses. We term the disparity in model performance with versus without explicit cultural context the explicit-implicit localization gap, indicating that while cultural knowledge exists within LLMs, it may not naturally surface in multilingual interactions if cultural context is not explicitly provided. Despite the explicit prompting benefit, however, the answers reduce in diversity and tend toward stereotypes. Second, we identify an explicit cultural customization vector, conserved across all non-English languages we explore, which enables LLMs to be steered from the synthetic English cultural world-model toward each non-English cultural world. Steered responses retain the diversity of implicit prompting and reduce stereotypes to dramatically improve the potential for customization. We discuss the implications of explicit cultural customization for understanding the conservation of alternative cultural world models within LLMs, and their controllable utility for translation, cultural customization, and the possibility of making the explicit implicit through soft control for expanded LLM function and appeal.

Paper Structure

This paper contains 30 sections, 5 equations, 25 figures, 13 tables.

Figures (25)

  • Figure 1: The explicit--implicit localization gap. (A) shows an example of a chat interaction where the model does not localize in the implicit setting in which the cultural context is only conveyed via the language of the request (top) and localizes in the explicit setting (bottom). (B) shows the $2 \times 2$ grid of experimental configurations used in our analysis. We vary two dimensions: prompt language and whether the cultural context is included in the prompt. We consider implicit localization the top right cell, and explicit localization the bottom row.
  • Figure 2: Heatmap showing the explicit--implicit localization gap across models and languages.
  • Figure 3: Heatmap showing the explicit--implicit localization gap across models and languages with a culturally relevant prefix prepended.
  • Figure 4: Activation patching results, where target prompt localized token probability (Loc. Prob) is shown in yellow, and non-localized target prompt token probability (Non Loc. Prob) is shown in blue. Finally, green shows the probability of answering the question from the source prompt. Shaded regions around plot lines represent 95% confidence intervals (CI), calculated as $\text{mean} \pm 1.96 \times \text{SEM}$. (Left) Source-translated prompt and target English prompt. (Right) Source-translated prompt with cultural context and target non-context translated prompt.
  • Figure 5: Steering results for per-culture vectors calculated using English pairs with $\alpha \in [-2,2]$ across layers [15-30], where the horizontal axis represents the layer at which the steering vector is applied, and the vertical axis indicates the ratio of localized responses. Titles denote prompt language.
  • ...and 20 more figures