TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Bibek Upadhayay, Vahid Behzadan
TL;DR
The paper addresses the high cost and data scarcity of expanding LLMs to low-resource languages by introducing Translation-Assisted Cross-Linguality (TaCo), which leverages translations in a chain-of-thought within a curriculum-learning framework. It pairs the Multilingual Instruction-Tuning Dataset (MITDS), constructed from translated Alpaca-52K and Dolly-15K into 132 languages, with a translated Vicuna Benchmark to enable cross-language evaluation, and fine-tunes Guanaco-33B using LoRA adapters. Empirical results show TaCo significantly surpasses standard instruction-tuning baselines, achieving strong performance across 3 low-resource languages (Nepali, Sanskrit, Maithili) and one high-resource language (Persian) on the Vicuna Benchmark, including near-doubling gains in several categories. The work provides practical resources—translated datasets and public adapters—that enable scalable multilingual transfer and offers a cost-efficient path toward broader language coverage in LLMs, while outlining future work on efficiency, robustness, and toxicity analysis.
Abstract
Creating multilingual LLMs poses a significant challenge. Pretraining or fine-tuning LLMs to adopt new languages is evidently very costly. Furthermore, there exist limitations concerning benchmark datasets and the metrics used to measure model performance in multilingual settings. This paper proposes cost-effective solutions to both aforementioned challenges. Firstly, we introduce the Multilingual Instruction-Tuning Dataset (MITS), comprised of Alpaca-52K, Dolly-15K, and Vicuna Benchmark translations into 132 languages. Secondly, we propose a new method called \emph{TaCo: Translation-Assisted Cross-Linguality}, which utilizes translations in a chain-of-thought process to instruction-tune LLMs on new languages through a curriculum-learning process. As a proof of concept, we experimented with the instruction-tuned Guanaco-33B model, performing further instruction tuning using our proposed TaCo method in three low-resource languages and one high-resource language. Our results indicate that the TaCo method impresses GPT-4 with an 82\% score for a low-resource language in the Vicuna Benchmark dataset, doubling the performance in contrast to instruction tuning alone. Furthermore, TaCo shows promise in creating multilingual LLMs, even for low-resource languages. We have released our datasets and model adapters\footnote{https://github.com/UNHSAILLab/TaCo} , encouraging the research community to utilize these resources to advance work on multilingual LLMs.
