Neuron-Level Analysis of Cultural Understanding in Large Language Models
Taisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka
TL;DR
This work introduces CULNIG, a gradient-based neuron attribution pipeline to dissect cultural understanding in large language models. It identifies culture-general neurons (less than 1% of neurons, concentrated in shallow-to-middle MLP layers) and culture-specific neurons tied to individual cultures, demonstrating that masking these neurons degrades cultural benchmarks significantly with minimal impact on general NLU tasks. Culture-general neurons exhibit broad, cross-cultural influence, while culture-specific neurons affect target cultures and related ones, revealing shared representations across cultures. The study also shows that fine-tuning with NLU data can erode cultural understanding if modules rich in culture-general neurons are updated, offering practical guidance for efficient, robust model training and engineering.
Abstract
As large language models (LLMs) are increasingly deployed worldwide, ensuring their fair and comprehensive cultural understanding is important. However, LLMs exhibit cultural bias and limited awareness of underrepresented cultures, while the mechanisms underlying their cultural understanding remain underexplored. To fill this gap, we conduct a neuron-level analysis to identify neurons that drive cultural behavior, introducing a gradient-based scoring method with additional filtering for precise refinement. We identify both culture-general neurons contributing to cultural understanding regardless of cultures, and culture-specific neurons tied to an individual culture. These neurons account for less than 1% of all neurons and are concentrated in shallow to middle MLP layers. We validate their role by showing that suppressing them substantially degrades performance on cultural benchmarks (by up to 30%), while performance on general natural language understanding (NLU) benchmarks remains largely unaffected. Moreover, we show that culture-specific neurons support knowledge of not only the target culture, but also related cultures. Finally, we demonstrate that training on NLU benchmarks can diminish models' cultural understanding when we update modules containing many culture-general neurons. These findings provide insights into the internal mechanisms of LLMs and offer practical guidance for model training and engineering. Our code is available at https://github.com/ynklab/CULNIG
