Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Tianyi Tang; Wenyang Luo; Haoyang Huang; Dongdong Zhang; Xiaolei Wang; Xin Zhao; Furu Wei; Ji-Rong Wen

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

TL;DR

This work identifies language-specific neurons in large language models using Language Activation Probability Entropy (LAPE) and demonstrates that a small subset of neurons, concentrated in the top and bottom layers, drives multilingual capabilities. By perturbing these neurons, the authors show language-specific degradation with limited cross-language interference, and they demonstrate steering of the model's output language. The study evaluates multiple open-source LLMs (e.g., LLaMA-2 and BLOOM) across several languages, revealing language-dedicated neural populations and a skewed layer distribution tied to semantic alignment and vocabulary mapping. The findings offer a mechanism for targeted influence over multilingual generation and provide a foundation for improved cross-lingual transfer and controlled generation in multilingual settings.

Abstract

Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

TL;DR

Abstract

Paper Structure (26 sections, 6 equations, 8 figures, 10 tables)

This paper contains 26 sections, 6 equations, 8 figures, 10 tables.

Introduction
Identifying Language-Specific Regions
Background
Language Activation Probability Entropy
Experiments
Experimental Setup
Models.
Dataset.
Identification methods.
Main Perturbation Experiments
Further Analysis
Distribution and Identification Ratio
Neuron distribution across languages.
Increasing threshold ratios for identification.
Structural Distribution Analysis
...and 11 more sections

Figures (8)

Figure 1: An illustration of region distribution of activated neurons when predicting the next word in language models across different languages. Here, colored circles denote activated neurons. When Chinese-specific neurons are deactivated (denoted by $\otimes$), the model may produce outputs in English.
Figure 2: Impact of four identification methods on the PPL increase of LLaMA-2 (7B). The element at the $i$-th row and $j$-th column is the PPL change for language $j$ due to perturbations in a specific region of language $i$.
Figure 3: Applying our LAPE method to different model types and sizes.
Figure 4: Change in PPL across different languages upon incremental number of language specific neurons when deactivating French neurons.
Figure 5: Distribution of language-specific neurons across different layers in LLaMA-2 (70B).
...and 3 more figures

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

TL;DR

Abstract

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)