No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement

Mateusz Klimaszewski; Piotr Andruszkiewicz; Alexandra Birch

No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement

Mateusz Klimaszewski, Piotr Andruszkiewicz, Alexandra Birch

TL;DR

The paper tackles negative interference and limited positive transfer in multilingual pre-trained language models by introducing Language Arithmetic (LA), a training-free post-processing technique that extends the MAD-X cross-lingual framework to operate on language adapters. By formulating language vectors and additive combinations, LA blends knowledge from related languages to improve zero-shot and low-resource transfer without additional training. Empirical results on NER, NLI, and QA across 13 languages using XLM-R and mBERT demonstrate sizable zero-shot gains and robust improvements when existing adapters are enhanced, with a particular edge in challenging low-resource scenarios. The work highlights the practicality of rapid language prototyping and provides analysis on lambda selection and language relatedness, suggesting broader applicability of training-free arithmetic in multilingual NLP.

Abstract

Modular deep learning is the state-of-the-art solution for lifting the curse of multilinguality, preventing the impact of negative interference and enabling cross-lingual performance in Multilingual Pre-trained Language Models. However, a trade-off of this approach is the reduction in positive transfer learning from closely related languages. In response, we introduce a novel method called language arithmetic, which enables training-free post-processing to address this limitation. Extending the task arithmetic framework, we apply learning via addition to the language adapters, transitioning the framework from a multi-task to a multilingual setup. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes, acting as a post-processing procedure. Language arithmetic consistently improves the baselines with significant gains, especially in the most challenging case of zero-shot application. Our code and models are available at https://github.com/mklimasz/language-arithmetic .

No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement

TL;DR

Abstract

Paper Structure (29 sections, 4 equations, 10 figures, 3 tables)

This paper contains 29 sections, 4 equations, 10 figures, 3 tables.

Introduction
Background
Task vectors & Task arithmetic
Method
Language arithmetic
Application
Training language adapter(s)
Training task adapters
Cross-lingual inference
Post-processing via language arithmetic
Experiments
Experimental setup
Datasets
Related languages
Implementation & training
...and 14 more sections

Figures (10)

Figure 1: Language arithmetic as an extension of the MAD-X framework. Given language and task adapters (left), language arithmetic (right) enables post-processing, training-free improvement in two use-cases: (i) zero-shot where a language adapter for a target language was not trained (presented in the figure as Spanish, which was not part of existing language adapters pool, $LA_{es}(en, fr)$) or (ii) to improve existing language adapters via arithmetic with either related language or a language on which task adapter was trained (e.g. $LA_{fr}(en, fr)$).
Figure 2: Zero-shot XLM-R language arithmetic evaluation, where one side of the arithmetic is an English adapter, and the other is related to the target language adapter (e.g. French for Spanish - $LA_{es}(en, fr)$). The values above bars present a relative difference to a better proxy. See Figure \ref{['fig:mBERT_zeroshot']} for the mBERT model.
Figure 3: Variants of language arithmetic compared to the MAD-X method in the use-case to improve an existing target language adapter. The values above bars present a difference between a better LA setup and the MAD-X framework for the XLM-R model (see Figure \ref{['fig:mbert_improving']} for mBERT).
Figure 4: NER and NLI evaluation of a set of adapters trained on a Wikipedia subset showcases that language arithmetic $LA_t(t,en)$ (green, dotted line) provides significant gains when compared against direct usage of the adapter (violet, solid line), especially in a very low-resource regime. The x-axis represents the token budget of each trained language adapter.
Figure 5: Interpolation of $\lambda$ values for the zero-shot XLM-R scenario (NER, for NLI and QA see Appendix \ref{['sec:appendix_lambda_nli_qa']}) on the validation dataset. The horizontal dashed lines represent the baseline scores for both languages used in language arithmetic.
...and 5 more figures

No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement

TL;DR

Abstract

No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement

Authors

TL;DR

Abstract

Table of Contents

Figures (10)