Table of Contents
Fetching ...

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Jenny Kunz, Oskar Holmström

TL;DR

This work analyzes the utility of language adapters, especially target-language adapters, for zero-shot cross-lingual NLU using XLM-R and mBERT across PAWS-X, XNLI, and XCOPA. Through extensive ablations, it reveals that the benefits of target-language adapters are inconsistent across languages, models, and tasks, and that retaining or even omitting language adapters can match or exceed adapter-based setups. The findings suggest that much cross-lingual transfer performance stems from the base multilingual model and task adapters, with language adapters contributing only modest, non-generalizable gains. The study highlights the need to identify precise conditions under which language adapters improve performance and to explore alternative modular architectures and broader language coverage for robust cross-lingual transfer.

Abstract

Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

TL;DR

This work analyzes the utility of language adapters, especially target-language adapters, for zero-shot cross-lingual NLU using XLM-R and mBERT across PAWS-X, XNLI, and XCOPA. Through extensive ablations, it reveals that the benefits of target-language adapters are inconsistent across languages, models, and tasks, and that retaining or even omitting language adapters can match or exceed adapter-based setups. The findings suggest that much cross-lingual transfer performance stems from the base multilingual model and task adapters, with language adapters contributing only modest, non-generalizable gains. The study highlights the need to identify precise conditions under which language adapters improve performance and to explore alternative modular architectures and broader language coverage for robust cross-lingual transfer.

Abstract

Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
Paper Structure (36 sections, 2 figures, 23 tables)

This paper contains 36 sections, 2 figures, 23 tables.

Figures (2)

  • Figure 1: Difference between the target-language adapter and source-language adapter on PAWS-X for XLM-R (left) and mBERT (right) for each source and target language. The amount of pre-training data decreases top-to-bottom/left-to-right.
  • Figure 2: Difference between the target-language adapter and source-language adapter on XNLI with XLM-R (left) and mBERT (right) for each source and target language. The amount of pre-training data decreases top-to-bottom/left-to-right.