Table of Contents
Fetching ...

Low-Rank Adaptation for Multilingual Summarization: An Empirical Study

Chenxi Whitehouse, Fantine Huot, Jasmijn Bastings, Mostafa Dehghani, Chu-Cheng Lin, Mirella Lapata

TL;DR

This work assesses Low-Rank Adaptation (LoRA) as a Parameter-Efficient Fine-Tuning method for multilingual abstractive summarization using PaLM 2 on XLSum and XWikis. It conducts a thorough empirical study across high-data, low-data, and cross-lingual transfer settings, comparing Full Fine-Tuning, FT-Att, and various LoRA configurations, including the LoraHub module-composition approach. Key findings show that LoRA matches or surpasses full fine-tuning in low-data and zero-shot/few-shot cross-lingual transfer, while large models (PaLM 2-S) make LoRA competitive with full fine-tuning in high-data regimes; higher LoRA ranks improve performance but demand careful tuning. The results underscore the practicality of PEFT for complex multilingual generation tasks and offer insights into cross-lingual transfer strategies, including language-specific LoRA modules and continued learning for few-shot settings. Overall, the study highlights LoRA as a scalable, data-efficient alternative for multilingual summarization, with implications for deploying large multilingual models under memory and data constraints.

Abstract

Although the advancements of pre-trained Large Language Models have significantly accelerated recent progress in NLP, their ever-increasing size poses significant challenges for conventional fine-tuning, especially in memory-intensive tasks. We investigate the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), in the domain of multilingual summarization, a task that is both challenging (due to typically long inputs), and relatively unexplored. We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer. We also study different strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning outperforms full fine-tuning and the dynamic composition of language-specific LoRA modules.

Low-Rank Adaptation for Multilingual Summarization: An Empirical Study

TL;DR

This work assesses Low-Rank Adaptation (LoRA) as a Parameter-Efficient Fine-Tuning method for multilingual abstractive summarization using PaLM 2 on XLSum and XWikis. It conducts a thorough empirical study across high-data, low-data, and cross-lingual transfer settings, comparing Full Fine-Tuning, FT-Att, and various LoRA configurations, including the LoraHub module-composition approach. Key findings show that LoRA matches or surpasses full fine-tuning in low-data and zero-shot/few-shot cross-lingual transfer, while large models (PaLM 2-S) make LoRA competitive with full fine-tuning in high-data regimes; higher LoRA ranks improve performance but demand careful tuning. The results underscore the practicality of PEFT for complex multilingual generation tasks and offer insights into cross-lingual transfer strategies, including language-specific LoRA modules and continued learning for few-shot settings. Overall, the study highlights LoRA as a scalable, data-efficient alternative for multilingual summarization, with implications for deploying large multilingual models under memory and data constraints.

Abstract

Although the advancements of pre-trained Large Language Models have significantly accelerated recent progress in NLP, their ever-increasing size poses significant challenges for conventional fine-tuning, especially in memory-intensive tasks. We investigate the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), in the domain of multilingual summarization, a task that is both challenging (due to typically long inputs), and relatively unexplored. We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer. We also study different strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning outperforms full fine-tuning and the dynamic composition of language-specific LoRA modules.
Paper Structure (36 sections, 3 figures, 22 tables)

This paper contains 36 sections, 3 figures, 22 tables.

Figures (3)

  • Figure 1: Results on the XLSum and XWikis datasets with PaLM 2-XXS trained in the low $\rightarrow$ high-data regime: Full FT vs. LoRA-4. Results for up to 256 examples per language are averaged over three seeds, with standard deviation shown in shaded areas.
  • Figure 2: XLSum output examples: zero-shot transfer from English using Full FT and LoRA with PaLM 2-XXS. Full FT fails to generate summaries in the target language and the content is off-topic.
  • Figure 3: Zero-shot cross-lingual transfer on XLSum (top) and XWikis (bottom); PaLM 2-XXS models are trained on one language (SEEN) and tested on another (UNSEEN). We also show results with full fine-tuning on all seen languages (Full FT), LoRA, and (average) weighted combination of language-specific LoRA modules (Avg.LoRA); excl.XX in XWikis denotes leave-one-out training, excluding the test language.