Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning
Kajetan Dymkiewicz, Ivan Vulic, Helen Yannakoudakis, Eilam Shapira, Roi Reichart, Anna Korhonen
TL;DR
The paper investigates how single-source PEFT/LoRA fine-tuning on a task-language pair propagates across a broad task–language grid in open-weight LLMs. By evaluating across model families and sizes, it reveals a robust on-task transfer when matching task with language, but collateral degradation in off-task directions, and uncovers a stable donor–recipient structure among languages and tasks. Using mixed-effects variance decomposition and a Consistency Index, it shows regime-dependent transfer dynamics and partial cross-model agreement, offering practical heuristics for risk-aware fine-tuning and specialization. The findings inform when to favor matched-task sources, or to adopt multi-source or regularized schedules to balance gains with potential harm, with implications for multilingual, multi-task AI systems.
Abstract
Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.
