Table of Contents
Fetching ...

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

Kajetan Dymkiewicz, Ivan Vulic, Helen Yannakoudakis, Eilam Shapira, Roi Reichart, Anna Korhonen

TL;DR

The paper investigates how single-source PEFT/LoRA fine-tuning on a task-language pair propagates across a broad task–language grid in open-weight LLMs. By evaluating across model families and sizes, it reveals a robust on-task transfer when matching task with language, but collateral degradation in off-task directions, and uncovers a stable donor–recipient structure among languages and tasks. Using mixed-effects variance decomposition and a Consistency Index, it shows regime-dependent transfer dynamics and partial cross-model agreement, offering practical heuristics for risk-aware fine-tuning and specialization. The findings inform when to favor matched-task sources, or to adopt multi-source or regularized schedules to balance gains with potential harm, with implications for multilingual, multi-task AI systems.

Abstract

Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

TL;DR

The paper investigates how single-source PEFT/LoRA fine-tuning on a task-language pair propagates across a broad task–language grid in open-weight LLMs. By evaluating across model families and sizes, it reveals a robust on-task transfer when matching task with language, but collateral degradation in off-task directions, and uncovers a stable donor–recipient structure among languages and tasks. Using mixed-effects variance decomposition and a Consistency Index, it shows regime-dependent transfer dynamics and partial cross-model agreement, offering practical heuristics for risk-aware fine-tuning and specialization. The findings inform when to favor matched-task sources, or to adopt multi-source or regularized schedules to balance gains with potential harm, with implications for multilingual, multi-task AI systems.

Abstract

Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Paper Structure

This paper contains 28 sections, 2 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Matched-Task (Cross-Language) vs. off-task Pareto frontier. Each point is a fine-tuning run. The x-axis shows the gain (pp): mean $\Delta_{\%}$ on the trained dataset aggregated over all languages excluding the trained language $(d,\ell^{*})$. The y-axis shows the mean off-task change (pp) across all other dataset--language pairs. Colours indicate source task type; marker shape encodes model family; marker size encodes size bucket S/M/L ($\leq 1.5$B / 2--6.9B / $\geq 7$B). Dashed lines mark zero gain/impact.
  • Figure 2: Language donor vs. recipient roles. Each code marks a language positioned by its Donor Score (x-axis) and Recipient Score (y-axis), computed within–task and cross-lingually while excluding the trained source cell. Quadrants (shaded): Donor+ & Recipient+ (green), Donor- & Recipient- (red), with intermediate tones for the mixed-sign quadrants. Highlighted languages: tr, nl, uk (strong donors and recipients) and yo, sw, ko, ja (weak donors and recipients). Scores are in percentage points (pp).
  • Figure 3: Task-to-task transfer heatmap. Cells show the mean percentage-point change when fine-tuning on the row (donor) task and evaluating the column (recipient) task; the diagonal is masked. Green denotes positive transfer and red denotes negative; numbers mark $|\Delta| \ge 1.0$ pp. See Appendix Table \ref{['tab:task-transfer']} for the full numeric matrix.
  • Figure 4: Mean $\Delta_{\%}$ (pp) by fine-tuned language (rows) and evaluated language (columns). Green indicates improvement, red indicates degradation. Per-pair values are listed in Appendix Table \ref{['tab:lang-lang-transfer']}.