MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder-LLM Integration in Cross-Lingual Reasoning
Kosei Uemura, David Guzmán, Quang Phuoc Nguyen, Jesujoba Oluwadara Alabi, En-shiun Annie Lee, David Ifeoluwa Adelani
TL;DR
MERLIN introduces a two-stage curriculum alignment framework that fuses a multilingual encoder with a frozen LLM to enhance cross-lingual reasoning in low-resource languages. In Stage I, a lightweight connector is trained via a three-part curriculum—General Mapping, Question Alignment, and Task-aware Augmentation—to project encoder outputs into the LLM’s embedding space without updating the LLM. Stage II then applies DoRA-based, parameter-efficient fine-tuning inside the decoder, freezing the encoder and the LLM backbone while adapting a small set of low-rank weights. Across MGSM, MSVAMP, AfriMGSM, and AfriXNLI, MERLIN achieves state-of-the-art results and substantial gains over strong baselines, particularly in low-resource languages, while maintaining competitive performance in high-resource languages. The results highlight the importance of cross-lingual embedding alignment and mid-layer decoder adaptations for reliable multilingual reasoning, enabling efficient deployment with modest computational budgets. Limitations include reliance on machine-translated data and a limited scope of tasks, suggesting avenues for future multi-task, data-filtered, and broader-domain evaluations.
Abstract
Large language models excel in English but still struggle with complex reasoning in many low-resource languages (LRLs). Existing encoder-plus-decoder methods such as LangBridge and MindMerger raise accuracy on mid and high-resource languages, yet they leave a large gap on LRLs. We present MERLIN, a two-stage model-stacking framework that applies a curriculum learning strategy -- from general bilingual bitext to task-specific data -- and adapts only a small set of DoRA weights. On the AfriMGSM benchmark MERLIN improves exact-match accuracy by +12.9 pp over MindMerger and outperforms GPT-4o-mini. It also yields consistent gains on MGSM and MSVAMP (+0.9 and +2.8 pp), demonstrating effectiveness across both low and high-resource settings.
