Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages
Quang Phuoc Nguyen, David Anugraha, Felix Gaschi, Jun Bin Cheng, En-Shiun Annie Lee
TL;DR
Rethinking what matters investigates whether realignment benefits in multilingual models require exhaustive language coverage or can be achieved with strategically selected language subsets. The authors introduce a sentence-level averaging contrastive objective that removes dependence on word aligners, enabling large-scale evaluation across 65 languages with two models (mBERT and XLM-R) on PoS, NER, and NLI tasks. They find that linguistically diverse subsets—quantified via URIEL featural diversity, language-family diversity, and script diversity—can match or exceed the performance of the full 65-language realignment, especially for unseen low-resource languages. The study demonstrates substantial gains for LRLs, robustness to out-of-distribution languages, and practical reductions in data collection overhead, suggesting that targeted diversity is more critical than sheer language count for effective multilingual realignment.
Abstract
Realignment is a promising strategy to improve cross-lingual transfer in multilingual language models. However, empirical results are mixed and often unreliable, particularly for typologically distant or low-resource languages (LRLs) compared to English. Moreover, word realignment tools often rely on high-quality parallel data, which can be scarce or noisy for many LRLs. In this work, we conduct an extensive empirical study to investigate whether realignment truly benefits from using all available languages, or if strategically selected subsets can offer comparable or even improved cross-lingual transfer, and study the impact on LRLs. Our controlled experiments show that realignment can be particularly effective for LRLs and that using carefully selected, linguistically diverse subsets can match full multilingual alignment, and even outperform it for unseen LRLs. This indicates that effective realignment does not require exhaustive language coverage and can reduce data collection overhead, while remaining both efficient and robust when guided by informed language selection.
