Table of Contents
Fetching ...

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya

TL;DR

This exploratory study investigates the potential of multilingual Automatic Post-Editing systems to enhance the quality of machine translations for low-resource Indo-Aryan languages and develops a robust multilingual APE model that outperforms their corresponding English-Hindi and English-Marathi single-pair models.

Abstract

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by $2.5$ and $2.39$ TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ($+1.29$ and $+1.44$ TER points), data augmentation ($+0.53$ and $+0.45$ TER points) and domain adaptation ($+0.35$ and $+0.45$ TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

TL;DR

This exploratory study investigates the potential of multilingual Automatic Post-Editing systems to enhance the quality of machine translations for low-resource Indo-Aryan languages and develops a robust multilingual APE model that outperforms their corresponding English-Hindi and English-Marathi single-pair models.

Abstract

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by and TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ( and TER points), data augmentation ( and TER points) and domain adaptation ( and TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.

Paper Structure

This paper contains 30 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: APE model architecture deoghare-etal-2023-quality
  • Figure 2: Comparison of English-Marathi and English-Hindi APE outputs obtained from Baseline APE and two MAPE systems.
  • Figure 3: Comparison of English-Marathi and English-Hindi APE outputs obtained from Baseline APE and MAPE systems.
  • Figure 4: Comparison of English-Marathi and English-Hindi APE outputs obtained from Baseline APE and MAPE systems.