Table of Contents
Fetching ...

Isolating Culture Neurons in Multilingual Large Language Models

Danial Namazifard, Lukas Galke Poech

TL;DR

This work addresses how culture is encoded in multilingual LLMs and whether culture-specific representations can be isolated from linguistic signals. It introduces LAPE and CAPE to identify language- and culture-specific neuron populations, and defines pure culture-specific neurons via set operations, complemented by MUREL, a 69-dataset, 85.2M-token resource spanning six cultures. The results show that culture-specific information substantially resides in upper-layer neuron populations and that many culture neurons are separable from language neurons, enabling targeted interventions with limited cross-language interference. Together, these contributions provide a framework for culturally informed editing and evaluation of multilingual NLP systems with implications for fairness, inclusivity, and alignment.

Abstract

Language and culture are deeply intertwined, yet it has been unclear how and where multilingual large language models encode culture. Here, we build on an established methodology for identifying language-specific neurons to localize and isolate culture-specific neurons, carefully disentangling their overlap and interaction with language-specific neurons. To facilitate our experiments, we introduce MUREL, a curated dataset of 85.2 million tokens spanning six different cultures. Our localization and intervention experiments show that LLMs encode different cultures in distinct neuron populations, predominantly in upper layers, and that these culture neurons can be modulated largely independently of language-specific neurons or those specific to other cultures. These findings suggest that cultural knowledge and propensities in multilingual language models can be selectively isolated and edited, with implications for fairness, inclusivity, and alignment. Code and data are available at https://github.com/namazifard/Culture_Neurons.

Isolating Culture Neurons in Multilingual Large Language Models

TL;DR

This work addresses how culture is encoded in multilingual LLMs and whether culture-specific representations can be isolated from linguistic signals. It introduces LAPE and CAPE to identify language- and culture-specific neuron populations, and defines pure culture-specific neurons via set operations, complemented by MUREL, a 69-dataset, 85.2M-token resource spanning six cultures. The results show that culture-specific information substantially resides in upper-layer neuron populations and that many culture neurons are separable from language neurons, enabling targeted interventions with limited cross-language interference. Together, these contributions provide a framework for culturally informed editing and evaluation of multilingual NLP systems with implications for fairness, inclusivity, and alignment.

Abstract

Language and culture are deeply intertwined, yet it has been unclear how and where multilingual large language models encode culture. Here, we build on an established methodology for identifying language-specific neurons to localize and isolate culture-specific neurons, carefully disentangling their overlap and interaction with language-specific neurons. To facilitate our experiments, we introduce MUREL, a curated dataset of 85.2 million tokens spanning six different cultures. Our localization and intervention experiments show that LLMs encode different cultures in distinct neuron populations, predominantly in upper layers, and that these culture neurons can be modulated largely independently of language-specific neurons or those specific to other cultures. These findings suggest that cultural knowledge and propensities in multilingual language models can be selectively isolated and edited, with implications for fairness, inclusivity, and alignment. Code and data are available at https://github.com/namazifard/Culture_Neurons.

Paper Structure

This paper contains 32 sections, 4 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Overview of our methodology for identifying pure culture-specific neurons in language models. We first identify language-specific neurons ($\mathbb{L}_k$) using literal, language-focused sentences (left), and culture-specific neurons ($\mathbb{C}_m$) using culturally salient phrases (right). By subtracting the language-specific neuron set from the culture-specific neuron set, we obtain pure culture-specific neurons ($\mathbb{C}_m \setminus \mathbb{L}_k$), which encode culture independently of language (bottom).
  • Figure 2: Total language, culture, and pure culture neurons per language and model. The total number of identified neurons is 3,523 for Llama-2-7b, 4,588 for Llama-3.1-8b, 1,004 for Qwen2.5-7b, and 7,373 for Gemma-3-12b.
  • Figure 3: Layer-wise distribution of language-, culture-, and pure culture-specific neurons for models. Layer-wise distribution per language is shown in Figure \ref{['fig:all_languages_layerwise_line_both']}.
  • Figure 4: Impact of ablating four neuron subsets on our MUREL test set in Llama-2-7b. Each cell $(i,j)$ shows perplexity (PPL) change on culture $j$ when ablating neurons of language or culture $i$.
  • Figure 5: Impact of ablating four neuron subsets on our MUREL test set in Llama-3.1-8b. Each cell $(i,j)$ shows the perplexity (PPL) change on culture $j$ when ablating neurons of language or culture $i$.
  • ...and 8 more figures