The Impact of Language Adapters in Cross-Lingual Transfer for NLU
Jenny Kunz, Oskar Holmström
TL;DR
This work analyzes the utility of language adapters, especially target-language adapters, for zero-shot cross-lingual NLU using XLM-R and mBERT across PAWS-X, XNLI, and XCOPA. Through extensive ablations, it reveals that the benefits of target-language adapters are inconsistent across languages, models, and tasks, and that retaining or even omitting language adapters can match or exceed adapter-based setups. The findings suggest that much cross-lingual transfer performance stems from the base multilingual model and task adapters, with language adapters contributing only modest, non-generalizable gains. The study highlights the need to identify precise conditions under which language adapters improve performance and to explore alternative modular architectures and broader language coverage for robust cross-lingual transfer.
Abstract
Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
