Table of Contents
Fetching ...

Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer

Abteen Ebrahimi, Adam Wiemerslage, Katharina von der Wense

TL;DR

The paper introduces NN-Rank, a data-driven method that ranks source languages for zero-shot cross-lingual transfer by computing nearest-neighbor signals from intermediate-layer hidden representations of multilingual models. It demonstrates substantial improvements in NDCG over state-of-the-art lexical/linguistic feature baselines across POS tagging and NER, using a broad set of languages and two strong encoders (mBERT and XLM-R). The work further shows that NN-Rank remains effective even with out-of-domain Bible data and provides thorough ablations on domain mismatch, representation layer choice, and target-data requirements, including viable rankings with as few as 25 target examples. These findings challenge reliance on static cross-lingual features and highlight the value of model-informed representations for guiding cross-lingual transfer, with practical implications for language coverage and resource-scarce scenarios.

Abstract

We present NN-Rank, an algorithm for ranking source languages for cross-lingual transfer, which leverages hidden representations from multilingual models and unlabeled target-language data. We experiment with two pretrained multilingual models and two tasks: part-of-speech tagging (POS) and named entity recognition (NER). We consider 51 source languages and evaluate on 56 and 72 target languages for POS and NER, respectively. When using in-domain data, NN-Rank beats state-of-the-art baselines that leverage lexical and linguistic features, with average improvements of up to 35.56 NDCG for POS and 18.14 NDCG for NER. As prior approaches can fall back to language-level features if target language data is not available, we show that NN-Rank remains competitive using only the Bible, an out-of-domain corpus available for a large number of languages. Ablations on the amount of unlabeled target data show that, for subsets consisting of as few as 25 examples, NN-Rank produces high-quality rankings which achieve 92.8% of the NDCG achieved using all available target data for ranking.

Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer

TL;DR

The paper introduces NN-Rank, a data-driven method that ranks source languages for zero-shot cross-lingual transfer by computing nearest-neighbor signals from intermediate-layer hidden representations of multilingual models. It demonstrates substantial improvements in NDCG over state-of-the-art lexical/linguistic feature baselines across POS tagging and NER, using a broad set of languages and two strong encoders (mBERT and XLM-R). The work further shows that NN-Rank remains effective even with out-of-domain Bible data and provides thorough ablations on domain mismatch, representation layer choice, and target-data requirements, including viable rankings with as few as 25 target examples. These findings challenge reliance on static cross-lingual features and highlight the value of model-informed representations for guiding cross-lingual transfer, with practical implications for language coverage and resource-scarce scenarios.

Abstract

We present NN-Rank, an algorithm for ranking source languages for cross-lingual transfer, which leverages hidden representations from multilingual models and unlabeled target-language data. We experiment with two pretrained multilingual models and two tasks: part-of-speech tagging (POS) and named entity recognition (NER). We consider 51 source languages and evaluate on 56 and 72 target languages for POS and NER, respectively. When using in-domain data, NN-Rank beats state-of-the-art baselines that leverage lexical and linguistic features, with average improvements of up to 35.56 NDCG for POS and 18.14 NDCG for NER. As prior approaches can fall back to language-level features if target language data is not available, we show that NN-Rank remains competitive using only the Bible, an out-of-domain corpus available for a large number of languages. Ablations on the amount of unlabeled target data show that, for subsets consisting of as few as 25 examples, NN-Rank produces high-quality rankings which achieve 92.8% of the NDCG achieved using all available target data for ranking.

Paper Structure

This paper contains 47 sections, 8 figures, 17 tables.

Figures (8)

  • Figure 1: Mean accuracy of a target language over all source languages compared to NDCG for that target language. Each bin contains 10 languages, and the y-axis is the average NDCG for each bin. Shading represents the 95% confidence interval for NDCG scores.
  • Figure 2: Distribution of ranking position for poor source datasets.
  • Figure 3: Data ablation results. Compares the performance using each subsample size to the Main results. The lower subplot shows the number of overlapping source datasets between the top five predicted datasets from the subsample ranking and main ranking.
  • Figure 4: Case study results for French. Token frequency plotted using log scale. Pearson's r is used.
  • Figure 5: Distribution of ranking position for source datasets with high unknown token percentage. A rank position of 0 is used to mark the top-ranked candidate; in the figure, a lower value signifies that the ranking method gave the source candidate a higher rank. We consider the source datasets with greater than 5% UNK tokens, using the mBERT tokenizer.
  • ...and 3 more figures