Table of Contents
Fetching ...

Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

Yangfan Ye, Xiaocheng Feng, Xiachong Feng, Libo Qin, Yichong Huang, Lei Huang, Weitao Ma, Qichen Hong, Zhirui Zhang, Yunfei Lu, Xiaohui Yan, Duyu Tang, Dandan Tu, Bing Qin

TL;DR

This work tackles the imbalance in multilingual capabilities and cultural adaptability in English-centric LLMs by introducing XTransplant, a framework that performs cross-lingual latent transplantation during inference. By transplanting latent activations across languages at decoder layers, the method aims to combine English strengths with non-English knowledge, revealing distinct roles for attention and feed-forward modules in multilingual understanding and culture-specific knowledge capture. Extensive analyses across multiple models, languages, cultures, and granularities show that Attn-level transplantation most benefits multilingual tasks while FFN-level transplantation better supports culture-related understanding, with substantial upper-bound potential beyond vanilla performance. The findings emphasize the existence of underutilized multilingual potential in current LLMs and highlight the need for dynamic, instance-aware layer selection strategies to approach the identified upper bound, offering a new direction for cross-lingual interaction research.

Abstract

Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely attributed to their English-centric pre-training data. In this paper, we introduce and investigate a cross-lingual latent transplantation (XTransplant) framework, which aims to further exploit the model's internalized multilingual knowledge during inference and examine its effects on the multilingual capability and cultural adaptability of LLMs. XTransplant framework enables models to harness the complementary strengths of both English and non-English resources by transplanting latent activations across languages. Through extensive analysis, we empirically demonstrate that XTransplant, a form of cross-lingual interaction, has mutually beneficial effects on the multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. We further reveal that attention modules play a pivotal role in supporting multilingual understanding, while feed-forward modules are more adept at capturing culture-specific knowledge. In addition, we conduct in-depth analysis of XTransplant's stability, effectiveness, and generalizability. By probing the upper bound performance of XTransplant, we expose the considerable underutilization of current LLMs' multilingual potential-a challenge that remains open. We hope our analysis offers a new lens for advancing cross-lingual interactions and better leveraging models' internalized multilingual knowledge.

Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

TL;DR

This work tackles the imbalance in multilingual capabilities and cultural adaptability in English-centric LLMs by introducing XTransplant, a framework that performs cross-lingual latent transplantation during inference. By transplanting latent activations across languages at decoder layers, the method aims to combine English strengths with non-English knowledge, revealing distinct roles for attention and feed-forward modules in multilingual understanding and culture-specific knowledge capture. Extensive analyses across multiple models, languages, cultures, and granularities show that Attn-level transplantation most benefits multilingual tasks while FFN-level transplantation better supports culture-related understanding, with substantial upper-bound potential beyond vanilla performance. The findings emphasize the existence of underutilized multilingual potential in current LLMs and highlight the need for dynamic, instance-aware layer selection strategies to approach the identified upper bound, offering a new direction for cross-lingual interaction research.

Abstract

Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely attributed to their English-centric pre-training data. In this paper, we introduce and investigate a cross-lingual latent transplantation (XTransplant) framework, which aims to further exploit the model's internalized multilingual knowledge during inference and examine its effects on the multilingual capability and cultural adaptability of LLMs. XTransplant framework enables models to harness the complementary strengths of both English and non-English resources by transplanting latent activations across languages. Through extensive analysis, we empirically demonstrate that XTransplant, a form of cross-lingual interaction, has mutually beneficial effects on the multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. We further reveal that attention modules play a pivotal role in supporting multilingual understanding, while feed-forward modules are more adept at capturing culture-specific knowledge. In addition, we conduct in-depth analysis of XTransplant's stability, effectiveness, and generalizability. By probing the upper bound performance of XTransplant, we expose the considerable underutilization of current LLMs' multilingual potential-a challenge that remains open. We hope our analysis offers a new lens for advancing cross-lingual interactions and better leveraging models' internalized multilingual knowledge.

Paper Structure

This paper contains 49 sections, 5 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Overview of the $\mathcal{X}$Transplant framework (feed-forward level). $\mathcal{X}$Transplant leverages the latent activations from the $i$-th layer of prompting in language A to replace the activations at the $j$-th layer when prompting in language B, thereby influencing the forward propagation in language B.
  • Figure 2: The effectiveness results of $\mathcal{X}$Transplant on LLaMA-2-7B-Chat at Attn-level and FFN-level against the vanilla performance. (a) represents the overall "Win, Tie, Lose" rates and (b) represents the average performance of all $N^2$ configurations (results on more models are in Figure \ref{['fig:win_mistral']}, \ref{['fig:win_qwen']}.).
  • Figure 3: The average layer-wise performance gains or declines of $\mathcal{X}$Transplant on LLaMA-2-7B-Chat, under different source or target layer configurations (results on more models are in Figure \ref{['fig:layer_lang_mistral']}, \ref{['fig:layer_lang_qwen']}.).
  • Figure 4: The layer-wise instance-aware upper bound results across different LLMs and PilotSets. The left represents the source-wise upper bound and the right represents the target-wise upper bound.
  • Figure 5: The effectiveness results of $\mathcal{X}$Transplant on Mistral-7B-Instruct-v0.3 at Attn-level and FFN-level against the vanilla performance. (a) represents the overall "Win, Tie, Lose" rates and (b) represents the average performance of all $N^2$ configurations.
  • ...and 5 more figures