Table of Contents
Fetching ...

AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages

Steve Bakos, Félix Gaschi, David Guzmán, Riddhi More, Kelly Chutong Li, En-Shiun Annie Lee

TL;DR

AlignFreeze addresses the inconsistent benefits of realignment for cross-lingual transfer in multilingual language models. By freezing either the lower or upper half of layers during realignment, the method reveals that realignment affects all layers but is especially harmful to lower layers, which AlignFreeze can shield for PoS tagging. Across 4 tasks, 3 models, and 35 languages, front-freezing improves PoS tagging in languages where full realignment fails, and aligns with better generalization than full realignment in several settings. The work highlights that cross-lingual transfer remains hard to predict and that partial freezing offers a practical, language-aware strategy to mitigate forgetting while enhancing transfer for syntactic/morphological tasks.

Abstract

Realignment techniques are often employed to enhance cross-lingual transfer in multilingual language models, still, they can sometimes degrade performance in languages that differ significantly from the fine-tuned source language. This paper introduces AlignFreeze, a method that freezes either the layers' lower half or upper half during realignment. Through controlled experiments on 4 tasks, 3 models, and in 35 languages, we find that realignment affects all the layers but can be the most detrimental to the lower ones. Freezing the lower layers can prevent performance degradation. Particularly, AlignFreeze improves Part-of-Speech (PoS) tagging performances in languages where full realignment fails: with XLM-R, it provides improvements of more than one standard deviation in accuracy in seven more languages than full realignment.

AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages

TL;DR

AlignFreeze addresses the inconsistent benefits of realignment for cross-lingual transfer in multilingual language models. By freezing either the lower or upper half of layers during realignment, the method reveals that realignment affects all layers but is especially harmful to lower layers, which AlignFreeze can shield for PoS tagging. Across 4 tasks, 3 models, and 35 languages, front-freezing improves PoS tagging in languages where full realignment fails, and aligns with better generalization than full realignment in several settings. The work highlights that cross-lingual transfer remains hard to predict and that partial freezing offers a practical, language-aware strategy to mitigate forgetting while enhancing transfer for syntactic/morphological tasks.

Abstract

Realignment techniques are often employed to enhance cross-lingual transfer in multilingual language models, still, they can sometimes degrade performance in languages that differ significantly from the fine-tuned source language. This paper introduces AlignFreeze, a method that freezes either the layers' lower half or upper half during realignment. Through controlled experiments on 4 tasks, 3 models, and in 35 languages, we find that realignment affects all the layers but can be the most detrimental to the lower ones. Freezing the lower layers can prevent performance degradation. Particularly, AlignFreeze improves Part-of-Speech (PoS) tagging performances in languages where full realignment fails: with XLM-R, it provides improvements of more than one standard deviation in accuracy in seven more languages than full realignment.

Paper Structure

This paper contains 35 sections, 1 equation, 2 figures, 24 tables.

Figures (2)

  • Figure 1: Variation of the accuracies with realignment with XLM-R Base for the PoS tagging and NLI tasks. Languages are sorted by the improvement brought by full realignment. The average increase in accuracy is computed over 5 runs. Numerical values and results for other models can be found in Appendix \ref{['appendix:additional_results']}.
  • Figure 2: Average accuracy for DistilMBERT when filtering the dataset for different percentiles of QE for the PoS tagging task.