Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang, Yulia Tsvetkov, Graham Neubig
TL;DR
The paper tackles imbalanced data in multilingual MT by learning a language scorer through differentiable data selection, formulating a bi-level optimization that jointly updates the MT model and the language-sampling policy. It extends the Differentiable Data Selection framework to Multilingual DDS (MultiDDS), introducing a stabilized reward variant (MultiDDS-S) to improve multi-language optimization stability. Empirical results on 58-language TED data show that MultiDDS-S consistently outperforms heuristic baselines across one-to-many and many-to-one settings and provides controllable priorities for targeted language performance. The approach is model-agnostic, memory-efficient, and broadly applicable to multilingual tasks beyond MT, enabling robust, configurable cross-language transfer and performance balancing.
Abstract
When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others. Standard practice is to up-sample less resourced languages to increase representation, and the degree of up-sampling has a large effect on the overall performance. In this paper, we propose a method that instead automatically learns how to weight training data through a data scorer that is optimized to maximize performance on all test languages. Experiments on two sets of languages under both one-to-many and many-to-one MT settings show our method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.
