AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Yiran Zhao; Wenxuan Zhang; Huiming Wang; Kenji Kawaguchi; Lidong Bing

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing

TL;DR

AdaMergeX introduces adaptive adapter merging to address the entanglement of task ability and language ability in cross-lingual transfer. By leveraging a reference task to capture language gaps and merging three adapters with a structure-aware strategy, it synthesizes target-task, target-language adapters without extensive labeled data. Empirical results across 12 languages and multiple tasks show consistent gains over prompting, decoupling methods, and standard adapter merging, with strong robustness to backbones and reference tasks. The approach demonstrates practical impact for multilingual LLM fine-tuning when labeled data is scarce and language coverage is broad.

Abstract

As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

TL;DR

Abstract

that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.

Paper Structure (39 sections, 10 equations, 2 figures, 12 tables)

This paper contains 39 sections, 10 equations, 2 figures, 12 tables.

Introduction
Background
LoRA
(IA)$^3$
Adapter & Prefix-Tuning
AdaMergeX: Adaptive Adapter Merging for Cross-lingual Transfer
Cross-Lingual Transfer via Adapter Merging
Structure-Adaptive Adapter Merging
LoRA
(IA)$^3$
Prefix-Tuning
AdaMergeX
Experiments
Experimental Setup
Datasets and Language
...and 24 more sections

Figures (2)

Figure 1: An overview of invariants of the language ability gap among different tasks in the adapter space, where by employing any three we can get the remaining one. In light of this observation, we propose AdaMergeX.
Figure 2: One-shot prompting examples of tested datasets.

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

TL;DR

Abstract

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (2)