Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Veniamin Veselovsky, Berke Argin, Benedikt Stroebl, Chris Wendler, Robert West, James Evans, Thomas L. Griffiths, Arvind Narayanan
TL;DR
This work investigates how large language models encode local cultural knowledge that can be surfaced or suppressed depending on prompting. It introduces the explicit-implicit localization gap (EI-Gap) to quantify differences between explicit cultural prompting and implicit, language-driven localization, showing that explicit prompts improve cultural localization but reduce diversity and increase stereotypes. Through activation patching and contrastive activation steering, the authors identify mid-layer circuits that mediate localization and demonstrate steering vectors that align outputs with target cultures across languages and tasks, reducing bias compared to explicit prompting. The findings suggest a universal, language-agnostic mechanism for cultural localization and offer practical avenues for culturally aware translation and customization, as well as avenues for further soft control in LLMs.
Abstract
Just as humans display language patterns influenced by their native tongue when speaking new languages, LLMs often default to English-centric responses even when generating in other languages. Nevertheless, we observe that local cultural information persists within the models and can be readily activated for cultural customization. We first demonstrate that explicitly providing cultural context in prompts significantly improves the models' ability to generate culturally localized responses. We term the disparity in model performance with versus without explicit cultural context the explicit-implicit localization gap, indicating that while cultural knowledge exists within LLMs, it may not naturally surface in multilingual interactions if cultural context is not explicitly provided. Despite the explicit prompting benefit, however, the answers reduce in diversity and tend toward stereotypes. Second, we identify an explicit cultural customization vector, conserved across all non-English languages we explore, which enables LLMs to be steered from the synthetic English cultural world-model toward each non-English cultural world. Steered responses retain the diversity of implicit prompting and reduce stereotypes to dramatically improve the potential for customization. We discuss the implications of explicit cultural customization for understanding the conservation of alternative cultural world models within LLMs, and their controllable utility for translation, cultural customization, and the possibility of making the explicit implicit through soft control for expanded LLM function and appeal.
