Table of Contents
Fetching ...

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

Boshko Koloski, Blaž Škrlj, Marko Robnik-Šikonja, Senja Pollak

TL;DR

The paper investigates how cross-lingual transfer strategies and tuning methods interact to influence both task performance and catastrophic forgetting in multilingual NLP. It systematically compares zero-shot, intermediate-training IT, and cross-lingual validation CLV alongside full-model fine-tuning and adapters, across multiple classification tasks and languages. Forgetting is quantified using Kemker metrics, with explicit definitions for $\Omega_{base}$, $\Omega_{new}$, and $\Omega_{all}$, situating English retention as a special case in multi-transfer settings. Key findings show that IT often yields stronger target-language performance, whereas CLV tends to better preserve knowledge in the base language across several transfers; the study also delivers open-source cross-lingual adapters and practical guidance on method selection under computational constraints. Overall, the work advances understanding of memory dynamics in cross-lingual learning and offers actionable recommendations for balancing performance and retention in resource-constrained multilingual deployments.

Abstract

The cross-lingual transfer is a promising technique to solve tasks in less-resourced languages. In this empirical study, we compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models in a cross-lingual setting. As fine-tuning strategies, we compare parameter-efficient adapter methods with fine-tuning of all parameters. As cross-lingual transfer strategies, we compare the intermediate-training (\textit{IT}) that uses each language sequentially and cross-lingual validation (\textit{CLV}) that uses a target language already in the validation phase of fine-tuning. We assess the success of transfer and the extent of catastrophic forgetting in a source language due to cross-lingual transfer, i.e., how much previously acquired knowledge is lost when we learn new information in a different language. The results on two different classification problems, hate speech detection and product reviews, each containing datasets in several languages, show that the \textit{IT} cross-lingual strategy outperforms \textit{CLV} for the target language. Our findings indicate that, in the majority of cases, the \textit{CLV} strategy demonstrates superior retention of knowledge in the base language (English) compared to the \textit{IT} strategy, when evaluating catastrophic forgetting in multiple cross-lingual transfers.

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

TL;DR

The paper investigates how cross-lingual transfer strategies and tuning methods interact to influence both task performance and catastrophic forgetting in multilingual NLP. It systematically compares zero-shot, intermediate-training IT, and cross-lingual validation CLV alongside full-model fine-tuning and adapters, across multiple classification tasks and languages. Forgetting is quantified using Kemker metrics, with explicit definitions for , , and , situating English retention as a special case in multi-transfer settings. Key findings show that IT often yields stronger target-language performance, whereas CLV tends to better preserve knowledge in the base language across several transfers; the study also delivers open-source cross-lingual adapters and practical guidance on method selection under computational constraints. Overall, the work advances understanding of memory dynamics in cross-lingual learning and offers actionable recommendations for balancing performance and retention in resource-constrained multilingual deployments.

Abstract

The cross-lingual transfer is a promising technique to solve tasks in less-resourced languages. In this empirical study, we compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models in a cross-lingual setting. As fine-tuning strategies, we compare parameter-efficient adapter methods with fine-tuning of all parameters. As cross-lingual transfer strategies, we compare the intermediate-training (\textit{IT}) that uses each language sequentially and cross-lingual validation (\textit{CLV}) that uses a target language already in the validation phase of fine-tuning. We assess the success of transfer and the extent of catastrophic forgetting in a source language due to cross-lingual transfer, i.e., how much previously acquired knowledge is lost when we learn new information in a different language. The results on two different classification problems, hate speech detection and product reviews, each containing datasets in several languages, show that the \textit{IT} cross-lingual strategy outperforms \textit{CLV} for the target language. Our findings indicate that, in the majority of cases, the \textit{CLV} strategy demonstrates superior retention of knowledge in the base language (English) compared to the \textit{IT} strategy, when evaluating catastrophic forgetting in multiple cross-lingual transfers.
Paper Structure (27 sections, 3 equations, 10 tables)