Table of Contents
Fetching ...

Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages

Quang Phuoc Nguyen, David Anugraha, Felix Gaschi, Jun Bin Cheng, En-Shiun Annie Lee

TL;DR

Rethinking what matters investigates whether realignment benefits in multilingual models require exhaustive language coverage or can be achieved with strategically selected language subsets. The authors introduce a sentence-level averaging contrastive objective that removes dependence on word aligners, enabling large-scale evaluation across 65 languages with two models (mBERT and XLM-R) on PoS, NER, and NLI tasks. They find that linguistically diverse subsets—quantified via URIEL featural diversity, language-family diversity, and script diversity—can match or exceed the performance of the full 65-language realignment, especially for unseen low-resource languages. The study demonstrates substantial gains for LRLs, robustness to out-of-distribution languages, and practical reductions in data collection overhead, suggesting that targeted diversity is more critical than sheer language count for effective multilingual realignment.

Abstract

Realignment is a promising strategy to improve cross-lingual transfer in multilingual language models. However, empirical results are mixed and often unreliable, particularly for typologically distant or low-resource languages (LRLs) compared to English. Moreover, word realignment tools often rely on high-quality parallel data, which can be scarce or noisy for many LRLs. In this work, we conduct an extensive empirical study to investigate whether realignment truly benefits from using all available languages, or if strategically selected subsets can offer comparable or even improved cross-lingual transfer, and study the impact on LRLs. Our controlled experiments show that realignment can be particularly effective for LRLs and that using carefully selected, linguistically diverse subsets can match full multilingual alignment, and even outperform it for unseen LRLs. This indicates that effective realignment does not require exhaustive language coverage and can reduce data collection overhead, while remaining both efficient and robust when guided by informed language selection.

Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages

TL;DR

Rethinking what matters investigates whether realignment benefits in multilingual models require exhaustive language coverage or can be achieved with strategically selected language subsets. The authors introduce a sentence-level averaging contrastive objective that removes dependence on word aligners, enabling large-scale evaluation across 65 languages with two models (mBERT and XLM-R) on PoS, NER, and NLI tasks. They find that linguistically diverse subsets—quantified via URIEL featural diversity, language-family diversity, and script diversity—can match or exceed the performance of the full 65-language realignment, especially for unseen low-resource languages. The study demonstrates substantial gains for LRLs, robustness to out-of-distribution languages, and practical reductions in data collection overhead, suggesting that targeted diversity is more critical than sheer language count for effective multilingual realignment.

Abstract

Realignment is a promising strategy to improve cross-lingual transfer in multilingual language models. However, empirical results are mixed and often unreliable, particularly for typologically distant or low-resource languages (LRLs) compared to English. Moreover, word realignment tools often rely on high-quality parallel data, which can be scarce or noisy for many LRLs. In this work, we conduct an extensive empirical study to investigate whether realignment truly benefits from using all available languages, or if strategically selected subsets can offer comparable or even improved cross-lingual transfer, and study the impact on LRLs. Our controlled experiments show that realignment can be particularly effective for LRLs and that using carefully selected, linguistically diverse subsets can match full multilingual alignment, and even outperform it for unseen LRLs. This indicates that effective realignment does not require exhaustive language coverage and can reduce data collection overhead, while remaining both efficient and robust when guided by informed language selection.

Paper Structure

This paper contains 35 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overall diagram of the realignment process. Our goal is to empirically investigate how language selection within the realignment dataset impacts overall downstream task performance.
  • Figure 2: Average performance across PoS Tagging, NER, and NLI for XLM-R and mBERT. The baselines are compared against the best-performing configuration from each language subset heuristic.
  • Figure 3: Heatmaps showing overall performance (averaged across four seeds) for different language subsets - HRLs, MRLs, and LRLs - seen and unseen during pre-training of XLM-R and mBERT. The fine-tuning only baseline remains strong for HRLs and MRLs, while realignment significantly improves performance on LRLs. Diversity-based language selection further amplifies these gains for LRLs.
  • Figure 4: Averaged out-of-distribution performance of XLM-R and mBERT on the AmericasNLI dataset, comparing different language selection heuristics against three realignment baselines and a fine-tuning-only baseline. Realignment with diversity-based language subsets outperforms both the realignment and fine-tuning-only baselines.
  • Figure 5: Scaling of average cross-lingual transfer performance with the number of languages used for realignment for XLM-R (left) and mBERT (right).
  • ...and 3 more figures