How effective is Multi-source pivoting for Translation of Low Resource Indian Languages?
Pranav Gaikwad, Meet Doshi, Raj Dabre, Pushpak Bhattacharyya
TL;DR
This paper investigates translating English to low-resource Indian languages using multi-source pivoting, leveraging both the source English sentence and pivot language representations with typically two pivot languages. It evaluates multiple architectural variants (LAM, cross-attention schemes, regularization) and data augmentation strategies (pivot-synthetic and target-synthetic) across Konkani, Manipuri, Sanskrit, and Bodo with Hindi, Marathi, and Bengali as pivots. The main finding is that multi-source pivoting provides only marginal improvements over strong baselines, though synthetic data can provide additional gains; the results challenge prior claims and suggest that pivoting benefits depend on data quantity and training setup. The work highlights a promising direction for low-resource MT and offers guidance for future exploration of multi-source pivoting and synthetic data use.
Abstract
Machine Translation (MT) between linguistically dissimilar languages is challenging, especially due to the scarcity of parallel corpora. Prior works suggest that pivoting through a high-resource language can help translation into a related low-resource language. However, existing works tend to discard the source sentence when pivoting. Taking the case of English to Indian language MT, this paper explores the 'multi-source translation' approach with pivoting, using both source and pivot sentences to improve translation. We conducted extensive experiments with various multi-source techniques for translating English to Konkani, Manipuri, Sanskrit, and Bodo, using Hindi, Marathi, and Bengali as pivot languages. We find that multi-source pivoting yields marginal improvements over the state-of-the-art, contrary to previous claims, but these improvements can be enhanced with synthetic target language data. We believe multi-source pivoting is a promising direction for Low-resource translation.
