Pivot Language for Low-Resource Machine Translation
Abhimanyu Talwar, Julien Laasri
TL;DR
This study tackles low-resource Nepali-English translation by introducing Hindi as a pivot language. It evaluates a fully supervised Transfer method and a semi-supervised Backtranslation approach, demonstrating a substantial improvement over a prior fully supervised baseline with a devtest SACREBLEU of $14.2$ for the transfer pipeline. The paper compiles diverse Nepali–Hindi and Hindi–English corpora, leverages large Hindi monolingual data, and analyzes pivot-related phenomena such as historical relatedness, reordering, and morphology. Overall, pivot-based transfer yields strong gains, though backtranslation can introduce noise, motivating future work on synthetic data strategies and extended training to close the gap with semi-supervised baselines and further improve robustness.
Abstract
Certain pairs of languages suffer from lack of a parallel corpus which is large in size and diverse in domain. One of the ways this is overcome is via use of a pivot language. In this paper we use Hindi as a pivot language to translate Nepali into English. We describe what makes Hindi a good candidate for the pivot. We discuss ways in which a pivot language can be used, and use two such approaches - the Transfer Method (fully supervised) and Backtranslation (semi-supervised) - to translate Nepali into English. Using the former, we are able to achieve a devtest Set SacreBLEU score of 14.2, which improves the baseline fully supervised score reported by (Guzman et al., 2019) by 6.6 points. While we are slightly below the semi-supervised baseline score of 15.1, we discuss what may have caused this under-performance, and suggest scope for future work.
