Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Aniruddha Roy; Pretam Ray; Ayush Maheshwari; Sudeshna Sarkar; Pawan Goyal

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Aniruddha Roy, Pretam Ray, Ayush Maheshwari, Sudeshna Sarkar, Pawan Goyal

TL;DR

This work tackles low-resource NMT for Indic languages not covered by mBART-50 by repurposing a multilingual encoder (XLM-R large) within a seq2seq framework and applying complementary knowledge distillation (CKD). The proposed XLM-MT and XLM-MT+CKD models achieve significant improvements in BLEU-4 and chrF over baselines across three Indic languages and four directions, with additional validation from human judgments. The approach also includes an analysis of model variants and a case study, and it is evaluated against mBART-50 on overlapping language pairs, highlighting complementary strengths. The authors provide open-source code, contributing a practical method for expanding multilingual NMT coverage without catastrophic forgetting.

Abstract

Neural Machine Translation (NMT) remains a formidable challenge, especially when dealing with low-resource languages. Pre-trained sequence-to-sequence (seq2seq) multi-lingual models, such as mBART-50, have demonstrated impressive performance in various low-resource NMT tasks. However, their pre-training has been confined to 50 languages, leaving out support for numerous low-resource languages, particularly those spoken in the Indian subcontinent. Expanding mBART-50's language support requires complex pre-training, risking performance decline due to catastrophic forgetting. Considering these expanding challenges, this paper explores a framework that leverages the benefits of a pre-trained language model along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages, including those not covered by mBART-50. The proposed framework employs a multilingual encoder-based seq2seq model as the foundational architecture and subsequently uses complementary knowledge distillation techniques to mitigate the impact of imbalanced training. Our framework is evaluated on three low-resource Indic languages in four Indic-to-Indic directions, yielding significant BLEU-4 and chrF improvements over baselines. Further, we conduct human evaluation to confirm effectiveness of our approach. Our code is publicly available at https://github.com/raypretam/Two-step-low-res-NMT.

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

TL;DR

Abstract

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

Authors

TL;DR

Abstract

Table of Contents