Low-Rank Adaptation for Multilingual Summarization: An Empirical Study
Chenxi Whitehouse, Fantine Huot, Jasmijn Bastings, Mostafa Dehghani, Chu-Cheng Lin, Mirella Lapata
TL;DR
This work assesses Low-Rank Adaptation (LoRA) as a Parameter-Efficient Fine-Tuning method for multilingual abstractive summarization using PaLM 2 on XLSum and XWikis. It conducts a thorough empirical study across high-data, low-data, and cross-lingual transfer settings, comparing Full Fine-Tuning, FT-Att, and various LoRA configurations, including the LoraHub module-composition approach. Key findings show that LoRA matches or surpasses full fine-tuning in low-data and zero-shot/few-shot cross-lingual transfer, while large models (PaLM 2-S) make LoRA competitive with full fine-tuning in high-data regimes; higher LoRA ranks improve performance but demand careful tuning. The results underscore the practicality of PEFT for complex multilingual generation tasks and offer insights into cross-lingual transfer strategies, including language-specific LoRA modules and continued learning for few-shot settings. Overall, the study highlights LoRA as a scalable, data-efficient alternative for multilingual summarization, with implications for deploying large multilingual models under memory and data constraints.
Abstract
Although the advancements of pre-trained Large Language Models have significantly accelerated recent progress in NLP, their ever-increasing size poses significant challenges for conventional fine-tuning, especially in memory-intensive tasks. We investigate the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), in the domain of multilingual summarization, a task that is both challenging (due to typically long inputs), and relatively unexplored. We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer. We also study different strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning outperforms full fine-tuning and the dynamic composition of language-specific LoRA modules.
