Table of Contents
Fetching ...

Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer

Soumen Kumar Mondal, Sayambhu Sen, Abhishek Singhania, Preethi Jyothi

TL;DR

This work interrogates whether language-specific neurons in multilingual LLMs can enhance cross-lingual transfer to low-resource languages. It identifies language-specific neurons via LAPE and Activation Probability 90p, then tests test-time activation interventions and LoRA-based fine-tuning on Llama 3.1 and Mistral Nemo across XNLI and XQuAD. The findings show negligible improvements (less than 1 point) and reveal polysemantic neuron activations that complicate isolated language control, challenging the assumption that language-specific neurons alone can enable transfer. The results emphasize the difficulty of cross-lingual generalization and suggest that alternative approaches are needed to achieve robust multilingual performance.

Abstract

Multilingual large language models (LLMs) aim towards robust natural language understanding across diverse languages, yet their performance significantly degrades on low-resource languages. This work explores whether existing techniques to identify language-specific neurons can be leveraged to enhance cross-lingual task performance of lowresource languages. We conduct detailed experiments covering existing language-specific neuron identification techniques (such as Language Activation Probability Entropy and activation probability-based thresholding) and neuron-specific LoRA fine-tuning with models like Llama 3.1 and Mistral Nemo. We find that such neuron-specific interventions are insufficient to yield cross-lingual improvements on downstream tasks (XNLI, XQuAD) in lowresource languages. This study highlights the challenges in achieving cross-lingual generalization and provides critical insights for multilingual LLMs.

Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer

TL;DR

This work interrogates whether language-specific neurons in multilingual LLMs can enhance cross-lingual transfer to low-resource languages. It identifies language-specific neurons via LAPE and Activation Probability 90p, then tests test-time activation interventions and LoRA-based fine-tuning on Llama 3.1 and Mistral Nemo across XNLI and XQuAD. The findings show negligible improvements (less than 1 point) and reveal polysemantic neuron activations that complicate isolated language control, challenging the assumption that language-specific neurons alone can enable transfer. The results emphasize the difficulty of cross-lingual generalization and suggest that alternative approaches are needed to achieve robust multilingual performance.

Abstract

Multilingual large language models (LLMs) aim towards robust natural language understanding across diverse languages, yet their performance significantly degrades on low-resource languages. This work explores whether existing techniques to identify language-specific neurons can be leveraged to enhance cross-lingual task performance of lowresource languages. We conduct detailed experiments covering existing language-specific neuron identification techniques (such as Language Activation Probability Entropy and activation probability-based thresholding) and neuron-specific LoRA fine-tuning with models like Llama 3.1 and Mistral Nemo. We find that such neuron-specific interventions are insufficient to yield cross-lingual improvements on downstream tasks (XNLI, XQuAD) in lowresource languages. This study highlights the challenges in achieving cross-lingual generalization and provides critical insights for multilingual LLMs.

Paper Structure

This paper contains 30 sections, 8 equations, 24 figures, 8 tables.

Figures (24)

  • Figure 1: Perplexity Change (PPXC): Measures the effect of interventions on target language perplexity, defined as $\text{PPXC}(i,j) = \text{PPX}(j \,|\, \text{Intervention by 0 at } i) - \text{PPX}(j)$. Lower $\text{PPXC}(i,j)$ values indicate minimal interference, while higher values signify a significant impact on the model's understanding of language $j$ (on 1 Million tokens).
  • Figure 2: Llama 3.1: Number of language neurons assigned per language for LAPE in a set of languages {en,es,fr,vi,id,zh,ja}.
  • Figure 3: Llama 3.1: Number of language neurons assigned per language for LAPE in a set of languages {en,bn,hi,ur,mr,pa,ta, te, ml, kn}.
  • Figure 4: Mistral Nemo: Number of language neurons assigned per language for LAPE in a set of languages {en,es,fr,vi,id,zh,ja}.
  • Figure 5: Mistral Nemo: Number of language neurons assigned per language for LAPE in a set of languages {en,bn,hi,ur,mr,pa,ta, te, ml, kn}.
  • ...and 19 more figures