LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Reasoning
Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang
TL;DR
This work tackles the persistent gap in reasoning for low-resource languages in multilingual LLMs caused by data imbalance and benchmark biases. It introduces LinguaLIFT, a two-stage instruction-tuning framework that uses a frozen language alignment layer learned through code-switched tuning to transfer English reasoning to low-resource languages, without requiring multilingual or parallel data. A new benchmark, MMWP, covers 48 languages across resource levels to evaluate multilingual mathematical reasoning, prompting broad evaluation beyond existing high-resource-dominant tests. Experimental results demonstrate that LinguaLIFT consistently outperforms strong baselines on MMWP and related benchmarks, generalizes across LLMs and tasks, and reveals insights into cross-lingual transfer, code-switching effects, and alignment visualization, highlighting its practical potential for inclusive multilingual reasoning.
Abstract
Large language models (LLMs) have exhibited impressive multilingual reasoning capabilities, driven by extensive multilingual pre-training corpora and instruction fine-tuning data. However, a performance gap exists between high- and low-resource language reasoning tasks due to the language imbalance in the pre-training corpus, which is exacerbated by evaluation bias in existing reasoning benchmarks lacking low-resource language coverage. To alleviate this issue, we propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language reasoning. LinguaLIFT employs a language alignment layer to capture multilingual alignment in a code-switched tuning way without requiring multilingual instruction or parallel data, thereby transferring the cross-lingual reasoning capabilities to low-resource languages through English-only instruction tuning data. To comprehensively evaluate the multilingual reasoning capabilities, we introduce the Multilingual Math World Problem (MMWP) benchmark, which spans 21 low-resource, 17 medium-resource, and 10 high-resource languages. Experimental results show that LinguaLIFT outperforms several competitive baselines across MMWP and four widely used benchmarks.
