Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim
TL;DR
MSLT improves cross-lingual transfer by exposing multilingual LMs to multiple source languages, promoting language-agnostic representations and more integrated embedding spaces. The paper demonstrates this via experiments on XLM-R and BLOOM-7B across WikiAnn, XNLI, PAWS-X and instruction-tuned settings, with visualizations and CK A-based analysis. It also shows that the number of source languages matters—benefits peak around three sources and can plateau or decline beyond that—and proposes practical heuristics for selecting language sets using vocabulary coverage, data availability, and especially linguistic diversity via Lang2Vec. The findings offer actionable guidance for constructing effective multilingual transfer pipelines and highlight the importance of diversity-aware language selection.
Abstract
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
