On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Takeshi Kojima; Itsuki Okimura; Yusuke Iwasawa; Hitomi Yanaka; Yutaka Matsuo

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Takeshi Kojima, Itsuki Okimura, Yusuke Iwasawa, Hitomi Yanaka, Yutaka Matsuo

TL;DR

This work probes decoder-only multilingual PLMs to understand language-specific representations. By adapting a neuron-identification approach, it computes per-neuron activation statistics in response to language-specific prompts, assigning a language-relevance score via per-neuron average precision $AP_m$, and identifies both top- and bottom-1000 language-specific neurons. The study finds that language-specific neurons are largely unique to each language with cross-language overlap below $5\%$, and they tend to reside in the early and late layers of the models. Through targeted neuron interventions that fix neuron outputs using the target-language medians, the authors demonstrate controllability over language generation in both unconditional and conditional (translation) settings, with notable improvements for Llama2 in translation tasks. The results offer insights into multilingual decoding dynamics and open avenues for language-specific compression and fine-tuning strategies in decoder-based PLMs, while acknowledging limitations to open models and a subset of languages.

Abstract

Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

TL;DR

, and identifies both top- and bottom-1000 language-specific neurons. The study finds that language-specific neurons are largely unique to each language with cross-language overlap below

, and they tend to reside in the early and late layers of the models. Through targeted neuron interventions that fix neuron outputs using the target-language medians, the authors demonstrate controllability over language generation in both unconditional and conditional (translation) settings, with notable improvements for Llama2 in translation tasks. The results offer insights into multilingual decoding dynamics and open avenues for language-specific compression and fine-tuning strategies in decoder-based PLMs, while acknowledging limitations to open models and a subset of languages.

Abstract

Paper Structure (37 sections, 3 equations, 40 figures, 12 tables)

This paper contains 37 sections, 3 equations, 40 figures, 12 tables.

Introduction
Related Work
Method
Finding Language-specific Neurons
Controlling Language-specific Neurons
Experiment Settings
Models
Datasets
Results and Discussion
Finding Language-specific Neurons
Controlling Language-specific Neurons
Unconditional text generation
Conditional text generation
Conclusion
Limitation
...and 22 more sections

Figures (40)

Figure 1: Overview of our proposal. (Left) Finding language-specific neurons that tend to be activated for a target language. (Right) Controlling the detected language-specific neurons by forcing their activation during inference to manipulate the probability of target language occurrence.
Figure 2: Distribution of Top, Middle, Bottom-1000 neurons across layers. 1st row:(XGLM-564M, de). 2nd row:(BLOOM-1b7, fr). 3rd row:(Llama2-13b, zh).
Figure 3: Model-generated text examples with unconditional text generation setting by XGLM-564M. Given a [BOS] token as input, the model generates outputs through a random sampling method.
Figure 4: Model-generated text examples with conditional text generation setting by Llama-2-7b. Given a machine translation task as input, the model generates outputs through a greedy decoding method.
Figure 5: (Top) Distributional difference of activation values of the top-1000 neurons between target (on) and non-target languages (off). (Bottom) Distributional difference of activation value of the bottom-1000 neurons.
...and 35 more figures

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

TL;DR

Abstract

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Authors

TL;DR

Abstract

Table of Contents

Figures (40)