Table of Contents
Fetching ...

Semantic Pivots Enable Cross-Lingual Transfer in Large Language Models

Kaiyu He, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

TL;DR

The paper tackles the challenge of quantifying and understanding cross-lingual transfer in multilingual LLMs by introducing CLWTD, a word-level, continuous evaluation task. It uncovers two inference modes—co-occurrence and semantic pivots—and links these behaviors to co-occurrence frequency and pre-training data content. By identifying semantic pivots from the pre-training corpus and constructing a pivot-rich training set, the authors demonstrate improved cross-lingual ability on an open-source 1B model, achieving measurable gains over baselines. This work advances interpretability in multilingual models and offers a practical data-centric approach to enhancing cross-lingual transfer without relying on large parallel corpora.

Abstract

Large language models (LLMs) demonstrate remarkable ability in cross-lingual tasks. Understanding how LLMs acquire this ability is crucial for their interpretability. To quantify the cross-lingual ability of LLMs accurately, we propose a Word-Level Cross-Lingual Translation Task. To find how LLMs learn cross-lingual ability, we trace the outputs of LLMs' intermediate layers in the word translation task. We identify and distinguish two distinct behaviors in the forward pass of LLMs: co-occurrence behavior and semantic pivot behavior. We attribute LLMs' two distinct behaviors to the co-occurrence frequency of words and find the semantic pivot from the pre-training dataset. Finally, to apply our findings to improve the cross-lingual ability of LLMs, we reconstruct a semantic pivot-aware pre-training dataset using documents with a high proportion of semantic pivots. Our experiments validate the effectiveness of our approach in enhancing cross-lingual ability. Our research contributes insights into the interpretability of LLMs and offers a method for improving LLMs' cross-lingual ability.

Semantic Pivots Enable Cross-Lingual Transfer in Large Language Models

TL;DR

The paper tackles the challenge of quantifying and understanding cross-lingual transfer in multilingual LLMs by introducing CLWTD, a word-level, continuous evaluation task. It uncovers two inference modes—co-occurrence and semantic pivots—and links these behaviors to co-occurrence frequency and pre-training data content. By identifying semantic pivots from the pre-training corpus and constructing a pivot-rich training set, the authors demonstrate improved cross-lingual ability on an open-source 1B model, achieving measurable gains over baselines. This work advances interpretability in multilingual models and offers a practical data-centric approach to enhancing cross-lingual transfer without relying on large parallel corpora.

Abstract

Large language models (LLMs) demonstrate remarkable ability in cross-lingual tasks. Understanding how LLMs acquire this ability is crucial for their interpretability. To quantify the cross-lingual ability of LLMs accurately, we propose a Word-Level Cross-Lingual Translation Task. To find how LLMs learn cross-lingual ability, we trace the outputs of LLMs' intermediate layers in the word translation task. We identify and distinguish two distinct behaviors in the forward pass of LLMs: co-occurrence behavior and semantic pivot behavior. We attribute LLMs' two distinct behaviors to the co-occurrence frequency of words and find the semantic pivot from the pre-training dataset. Finally, to apply our findings to improve the cross-lingual ability of LLMs, we reconstruct a semantic pivot-aware pre-training dataset using documents with a high proportion of semantic pivots. Our experiments validate the effectiveness of our approach in enhancing cross-lingual ability. Our research contributes insights into the interpretability of LLMs and offers a method for improving LLMs' cross-lingual ability.

Paper Structure

This paper contains 20 sections, 8 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Different series of models' distribution of cross-language ability scores shown in left. Models belonging to the same family are consistently represented by the same color. The specific distribution of the OLMo-7B-0424 model's cross-lingual ability score matrix is shown in middle. The changes in our metric and flores scores during the training process of the OLMo-7B are shown in right. Different shapes are used to distinguish different tasks, and different colors are used to distinguish different language pairs. "avg" represents the cross-lingual ability averaged across all language pairs.
  • Figure 2: The example of the model's logit lens at the first output token in 31 layers. The left side illustrates the model's forward pass process of the co-occurrence behavior, while the right side depicts the process of the semantic pivot behavior.
  • Figure 3: The semantic pivots set of probability in the last eight layers. The x-axis represents the OLMo-7B's layer index, and the y-axis indicates the total probability of all tokens in the semantic pivot set and the target word's tokens.
  • Figure 4: The process of constructing a semantic pivot-aware pre-training dataset used to improve the model's cross-lingual ability.
  • Figure 5: The specific distribution of the model's cross-lingual ability score matrix. The title describes the model we evaluate. The vertical axis represents the source language; the horizontal axis represents the target language.
  • ...and 5 more figures