X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Haoran Xu, Kenton Murray, Philipp Koehn, Hieu Hoang, Akiko Eriguchi, Huda Khayrallah
TL;DR
X-ALMA introduces a plug-and-play, language-grouped architecture with language-specific LS modules to deliver uniformly high translation quality across 50 languages. It couples this architecture with a five-stage training recipe and a novel Adaptive-Rejection Preference Optimization (ARPO) to address over-rejection in MT preference learning, achieving superior COMET-22 performance on FLORES-200 and WMT'23 versus open multilingual models. The results include 97 of 98 directions matching or exceeding baselines (XCOMET-XL), and the authors release preference data and checkpoints to support reproducibility. This work advances multilingual translation by combining modular design with targeted preference optimization to mitigate multilinguality trade-offs and resource disparities across languages.
Abstract
Large language models (LLMs) have achieved remarkable success across various NLP tasks with a focus on English due to English-centric pre-training and limited multilingual data. In this work, we focus on the problem of translation, and while some multilingual LLMs claim to support for hundreds of languages, models often fail to provide high-quality responses for mid- and low-resource languages, leading to imbalanced performance heavily skewed in favor of high-resource languages. We introduce **X-ALMA**, a model designed to ensure top-tier performance across 50 diverse languages, regardless of their resource levels. X-ALMA surpasses state-of-the-art open-source multilingual LLMs, such as Aya-101 and Aya-23, in every single translation direction on the FLORES-200 and WMT'23 test datasets according to COMET-22. This is achieved by plug-and-play language-specific module architecture to prevent language conflicts during training and a carefully designed training regimen with novel optimization methods to maximize the translation performance. After the final stage of training regimen, our proposed **A**daptive **R**ejection **P**reference **O**ptimization (**ARPO**) surpasses existing preference optimization methods in translation tasks.
