On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Takeshi Kojima, Itsuki Okimura, Yusuke Iwasawa, Hitomi Yanaka, Yutaka Matsuo
TL;DR
This work probes decoder-only multilingual PLMs to understand language-specific representations. By adapting a neuron-identification approach, it computes per-neuron activation statistics in response to language-specific prompts, assigning a language-relevance score via per-neuron average precision $AP_m$, and identifies both top- and bottom-1000 language-specific neurons. The study finds that language-specific neurons are largely unique to each language with cross-language overlap below $5\%$, and they tend to reside in the early and late layers of the models. Through targeted neuron interventions that fix neuron outputs using the target-language medians, the authors demonstrate controllability over language generation in both unconditional and conditional (translation) settings, with notable improvements for Llama2 in translation tasks. The results offer insights into multilingual decoding dynamics and open avenues for language-specific compression and fine-tuning strategies in decoder-based PLMs, while acknowledging limitations to open models and a subset of languages.
Abstract
Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.
