Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield
TL;DR
This work investigates how to scale instruction tuning of large language models across multiple languages under a fixed computational budget. It compares monolingual versus multilingual instruction tuning using two tuning paradigms, LoRA and full-parameter fine-tuning, on data derived from the Alpaca dataset with machine-translated multilingual variants. Key findings show that multilingual tuning, particularly with LoRA, is often on par with or superior to language-specific tuning, and that downsampling multilingual data yields robust performance on unseen languages. The results offer practical guidance for expanding language support in open-source LLM ecosystems while highlighting the trade-offs between tuning strategies and model sizes.
Abstract
Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-efficient strategies for multilingual scenarios. Our study employs the Alpaca dataset and machine translations of it to form multilingual data, which is then used to tune LLMs through either low-rank adaptation or full-parameter training. Under a controlled computation budget, comparisons show that multilingual tuning is on par or better than tuning a model for each language. Furthermore, multilingual tuning with downsampled data can be as powerful and more robust. Our findings serve as a guide for expanding language support through instruction tuning.
