MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

Zixian Huang; Wenhao Zhu; Gong Cheng; Lei Li; Fei Yuan

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan

TL;DR

MindMerger addresses the gap in multilingual reasoning by preserving LLMs' built-in reasoning and language understanding and augmenting them with external multilingual model capabilities. It introduces a two-stage training scheme that first embeds external language understanding into the LLM representation and then enables collaborative use of external and built-in capabilities, without updating LLM parameters. Empirically, MindMerger-Soft achieves consistent improvements across MGSM, MSVAMP, X-CSQA, and XNLI, with average gains of $6.7\%$ overall and $8.0\%$ in low-resource languages on MGSM, and outperforms replacement-based baselines by significant margins. The work highlights that encoder-based multilingual models and alignment-based representation merging are effective for cross-language reasoning, and demonstrates robustness across multiple LLMs, suggesting broad applicability to multilingual AI tasks.

Abstract

Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MindMerger consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7% and 8.0% across all languages and low-resource languages on the MGSM dataset, respectively.

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

TL;DR

overall and

in low-resource languages on MGSM, and outperforms replacement-based baselines by significant margins. The work highlights that encoder-based multilingual models and alignment-based representation merging are effective for cross-language reasoning, and demonstrates robustness across multiple LLMs, suggesting broad applicability to multilingual AI tasks.

Abstract

Paper Structure (28 sections, 7 equations, 4 figures, 16 tables)

This paper contains 28 sections, 7 equations, 4 figures, 16 tables.

Introduction
Related Work
Approach
Model Structure
Two-Stage Training
Experiments
Compared Methods
Datasets
Experimental Results
Analysis
The Usage of Multilingual Model
Ablation Studies
Merging with Different LLMs
Representation Space Changes
Supplementary Experiments
...and 13 more sections

Figures (4)

Figure 1: Examples of multilingual mathematical reasoning from the MGSM dataset. LLM can generate correct and incorrect answers when asked in different languages.
Figure 2: Overview of the model structure and training scheme of MindMerger, which consists of an LLM (blue) and a external model (yellow) and is trained by a two-stage scheme.
Figure 3: Ablation experiments of MindMerger-Soft on the MGSM dataset. Lrl., Hrl., and Avg. represent the average accuracy across low-resource languages, high-resource languages, and all languages, respectively. Referring to mgsm, we regard Bn, Th, and Sw as low-resourse languages, and regard the remaining languages as high-resource languages.
Figure 4: T-SNE visualization in the spaces of the LLM embeddings and mapping layer outputs.

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

TL;DR

Abstract

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (4)