The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Jenny Kunz; Oskar Holmström

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Jenny Kunz, Oskar Holmström

TL;DR

This work analyzes the utility of language adapters, especially target-language adapters, for zero-shot cross-lingual NLU using XLM-R and mBERT across PAWS-X, XNLI, and XCOPA. Through extensive ablations, it reveals that the benefits of target-language adapters are inconsistent across languages, models, and tasks, and that retaining or even omitting language adapters can match or exceed adapter-based setups. The findings suggest that much cross-lingual transfer performance stems from the base multilingual model and task adapters, with language adapters contributing only modest, non-generalizable gains. The study highlights the need to identify precise conditions under which language adapters improve performance and to explore alternative modular architectures and broader language coverage for robust cross-lingual transfer.

Abstract

Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

TL;DR

Abstract

Paper Structure (36 sections, 2 figures, 23 tables)

This paper contains 36 sections, 2 figures, 23 tables.

Introduction
Related Work
Modular Deep Learning.
Language Adapters.
Experimental Setup
Model and Adapters
Adapter Setups.
Pre-Training Data.
Data Sets
PAWS-X.
XNLI.
XCOPA.
Evaluation Setup
Results
General Trends
...and 21 more sections

Figures (2)

Figure 1: Difference between the target-language adapter and source-language adapter on PAWS-X for XLM-R (left) and mBERT (right) for each source and target language. The amount of pre-training data decreases top-to-bottom/left-to-right.
Figure 2: Difference between the target-language adapter and source-language adapter on XNLI with XLM-R (left) and mBERT (right) for each source and target language. The amount of pre-training data decreases top-to-bottom/left-to-right.

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

TL;DR

Abstract

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Authors

TL;DR

Abstract

Table of Contents

Figures (2)