MindMerger: Efficient Boosting LLM Reasoning in non-English Languages
Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan
TL;DR
MindMerger addresses the gap in multilingual reasoning by preserving LLMs' built-in reasoning and language understanding and augmenting them with external multilingual model capabilities. It introduces a two-stage training scheme that first embeds external language understanding into the LLM representation and then enables collaborative use of external and built-in capabilities, without updating LLM parameters. Empirically, MindMerger-Soft achieves consistent improvements across MGSM, MSVAMP, X-CSQA, and XNLI, with average gains of $6.7\%$ overall and $8.0\%$ in low-resource languages on MGSM, and outperforms replacement-based baselines by significant margins. The work highlights that encoder-based multilingual models and alignment-based representation merging are effective for cross-language reasoning, and demonstrates robustness across multiple LLMs, suggesting broad applicability to multilingual AI tasks.
Abstract
Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MindMerger consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7% and 8.0% across all languages and low-resource languages on the MGSM dataset, respectively.
