Table of Contents
Fetching ...

Analysis of Multi-Source Language Training in Cross-Lingual Transfer

Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim

TL;DR

MSLT improves cross-lingual transfer by exposing multilingual LMs to multiple source languages, promoting language-agnostic representations and more integrated embedding spaces. The paper demonstrates this via experiments on XLM-R and BLOOM-7B across WikiAnn, XNLI, PAWS-X and instruction-tuned settings, with visualizations and CK A-based analysis. It also shows that the number of source languages matters—benefits peak around three sources and can plateau or decline beyond that—and proposes practical heuristics for selecting language sets using vocabulary coverage, data availability, and especially linguistic diversity via Lang2Vec. The findings offer actionable guidance for constructing effective multilingual transfer pipelines and highlight the importance of diversity-aware language selection.

Abstract

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.

Analysis of Multi-Source Language Training in Cross-Lingual Transfer

TL;DR

MSLT improves cross-lingual transfer by exposing multilingual LMs to multiple source languages, promoting language-agnostic representations and more integrated embedding spaces. The paper demonstrates this via experiments on XLM-R and BLOOM-7B across WikiAnn, XNLI, PAWS-X and instruction-tuned settings, with visualizations and CK A-based analysis. It also shows that the number of source languages matters—benefits peak around three sources and can plateau or decline beyond that—and proposes practical heuristics for selecting language sets using vocabulary coverage, data availability, and especially linguistic diversity via Lang2Vec. The findings offer actionable guidance for constructing effective multilingual transfer pipelines and highlight the importance of diversity-aware language selection.

Abstract

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
Paper Structure (28 sections, 2 equations, 6 figures, 8 tables)

This paper contains 28 sections, 2 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of the effectiveness of Multi-Source Language Training (MSLT) in cross-lingual transfer. As we adopt more sophisticated approaches for MSLT, we can expect improved performance (from bottom to top).
  • Figure 2: A conceptual illustration of the advantages of MSLT over SSLT. The left illustrates the training process of an LM using only English (en) (i.e., $\text{SSLT}(\texttt{en})$), while the right represents MSLT with English (en) and Spanish (es) (i.e., $\text{MSLT}(\texttt{en}, \texttt{es})$). Incorporating more source languages enhances language-agnostic features and blurs language-specific ones, potentially improving effectiveness for unseen languages such as Korean (ko).
  • Figure 3: Visualization of embeddings and corresponding CKA similarities kornblith2019similarity for 3 languages: English (en), Arabic (ar), and Indonesian (id). Note that English is used in both SSLT & MSLT, whereas Arabic and Indonesian are not. Therefore, we can observe the impact of SSLT & MSLT on both languages seen and unseen during training. Left: the original XLM-R. Center: XLM-R after $\text{SSLT}(\texttt{en})$. Right: XLM-R after $\text{MSLT}(\texttt{en}, \texttt{es}, \texttt{de})$. We find that while SSLT promotes language-agnostic alignment in the semantic space, MSLT enhances this further, leading to a more integrated space for languages.
  • Figure 4: Performance of XLT can vary depending on the number of source languages. The solid lines correspond to XLM-R$_{\text{Base}}$ and dotted lines to XLM-R$_\text{Large}$.
  • Figure 5: Relative performance gaps vary with source language combinations, reaching as high as a 10-point difference between the best and worst options. The worst combinations of MSLT in XCOPA and XWinograd even harm performance compared to SSLT, highlighting the need for careful source language selection.
  • ...and 1 more figures