FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation
Shaolin Zhu, Tianyu Dong, Bo Li, Deyi Xiong
TL;DR
FuxiMT tackles the limited extent of Chinese-centric multilingual MT by integrating a sparsified BLOOMz LLM with Mixture-of-Experts into a decoder, guided by a two-stage training regime. It first performs Chinese-centric pre-training on 5B tokens and then multilingual fine-tuning on over 100B parallel sentences across 65 languages, using curriculum learning and back-translation. The model achieves substantial gains over strong baselines, particularly in low-resource and zero-shot settings, demonstrating effective cross-lingual transfer while maintaining efficiency through MoEs. This work suggests a viable path to bridge linguistic gaps by preserving a frozen backbone and routing inputs through specialized experts across a broad language space.
Abstract
In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.
