Table of Contents
Fetching ...

How do Large Language Models Handle Multilingualism?

Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing

TL;DR

The paper investigates how multilingual LLMs process input and proposes MWork, a three-stage workflow where multilingual inputs are understood and translated into English for reasoning before generating responses in the original language. It introduces PLND to identify language-specific neurons without labeled data, and shows that deactivating these neurons markedly degrades multilingual tasks while English remains relatively robust. The authors validate MWork through targeted deactivations across understanding, reasoning, knowledge extraction, and generation, demonstrating that multilingual capabilities hinge on distinct attention and FFN structures. They further show that fine-tuning language-specific neurons with a small corpus can meaningfully boost multilingual performance across high- and low-resource languages without harming others. Overall, MWork offers a precise, data-efficient avenue for analyzing and enhancing multilingual processing in LLMs with practical implications for cross-language NLP deployment.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. Using $\texttt{PLND}$, we validate $\texttt{MWork}$ through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. Moreover, $\texttt{MWork}$ allows fine-tuning of language-specific neurons with a small dataset, enhancing multilingual abilities in a specific language without compromising others. This approach results in an average improvement of $3.6\%$ for high-resource languages and $2.3\%$ for low-resource languages across all tasks with just $400$ documents.

How do Large Language Models Handle Multilingualism?

TL;DR

The paper investigates how multilingual LLMs process input and proposes MWork, a three-stage workflow where multilingual inputs are understood and translated into English for reasoning before generating responses in the original language. It introduces PLND to identify language-specific neurons without labeled data, and shows that deactivating these neurons markedly degrades multilingual tasks while English remains relatively robust. The authors validate MWork through targeted deactivations across understanding, reasoning, knowledge extraction, and generation, demonstrating that multilingual capabilities hinge on distinct attention and FFN structures. They further show that fine-tuning language-specific neurons with a small corpus can meaningfully boost multilingual performance across high- and low-resource languages without harming others. Overall, MWork offers a precise, data-efficient avenue for analyzing and enhancing multilingual processing in LLMs with practical implications for cross-language NLP deployment.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow (): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify , we introduce Parallel Language-specific Neuron Detection () to identify activated neurons for inputs in different languages without any labeled data. Using , we validate through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. Moreover, allows fine-tuning of language-specific neurons with a small dataset, enhancing multilingual abilities in a specific language without compromising others. This approach results in an average improvement of for high-resource languages and for low-resource languages across all tasks with just documents.
Paper Structure (47 sections, 11 equations, 7 figures, 17 tables)

This paper contains 47 sections, 11 equations, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Ratio of English and non-English tokens among layers given non-English queries.
  • Figure 2: Our hypothesized multilingual workflow, MWork, converts multilingual queries to English for reasoning in English and generates responses in the original language, demonstrating a layered processing approach.
  • Figure 3: Number of language-specific neurons when processing multilingual queries.
  • Figure 4: Enhancement results on high-resource languages, while the number is average among languages.
  • Figure 5: Overlapping ratio of language-specific neurons in self-attention and feed-forward structures.
  • ...and 2 more figures