Extending Multilingual Machine Translation through Imitation Learning
Wen Lai, Viktor Hangya, Yingli Shen, Alexander Fraser
TL;DR
Problem: extend MNMT to new languages without access to original training data and without forgetting existing languages. Approach: formulate extension as imitation learning, using an expert to generate pseudo-parallel data and a learner to imitate data distribution and translation behavior with language-weighted objectives. Findings: Imit-MNMT improves translations between the new language and all existing languages while preserving original performance, reduces copy and off-target errors, and shows script-based transfer. Significance: enables scalable, data-efficient expansion of MNMT systems to many languages and can generalize to other models and NLP tasks.
Abstract
Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to incorporate a new language, enabling translations between this new language and all previously supported languages, even in the challenging scenario where only a parallel corpus between the new language and English is available. Previous methods, such as continued training on parallel data including the new language, often suffer from catastrophic forgetting, which degrades performance on other languages. We propose a novel approach Imit-MNMT which treats this task as an imitation learning problem, a technique widely used in computer vision but less explored in natural language processing. Specifically, we leverage an expert model to generate pseudo-parallel corpora between the new language and the existing languages. We then introduce a data distribution imitation strategy using language-specific weighting, alongside a translation behavior imitation mechanism. Extensive experiments show that our approach significantly improves translation performance between the new and existing languages while mitigating catastrophic forgetting.
