Table of Contents
Fetching ...

Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation

Chris Emezue

TL;DR

This work evaluates SEARNN, a learning-to-search approach, for training RNN-based neural machine translation in low-resource settings. By leveraging roll-in/roll-out trajectories and cost-sensitive losses (log-loss and KL loss), the method addresses exposure bias and metric misalignment inherent to MLE. On English→Igbo, French→Éwé, and French→Ghomálá using the MAFAND-MT corpus, SEARNN yields an average BLEU improvement of $5.4\%$ over MLE, demonstrating viability for morphologically rich, data-scarce languages. The study highlights SEARNN's potential and suggests further gains via speedups and integration with stronger architectures like Transformers.

Abstract

Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.

Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation

TL;DR

This work evaluates SEARNN, a learning-to-search approach, for training RNN-based neural machine translation in low-resource settings. By leveraging roll-in/roll-out trajectories and cost-sensitive losses (log-loss and KL loss), the method addresses exposure bias and metric misalignment inherent to MLE. On English→Igbo, French→Éwé, and French→Ghomálá using the MAFAND-MT corpus, SEARNN yields an average BLEU improvement of over MLE, demonstrating viability for morphologically rich, data-scarce languages. The study highlights SEARNN's potential and suggests further gains via speedups and integration with stronger architectures like Transformers.

Abstract

Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of \% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.
Paper Structure (12 sections, 3 equations, 3 figures, 2 tables)

This paper contains 12 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: "Illustration of the roll-in/roll-out mechanism used in SEARNN. The goal is to obtain a vector of costs for each cell of the RNN in order to define a cost-sensitive loss to train the network. These vectors have one entry per possible token. Here, we show how to obtain the vector of costs for the red cell. First, we use a roll-in policy to predict until the cell of interest. We highlight here the learned policy where the network passes its own prediction to the next cell. Second, we proceed to the roll-out phase. We feed every possible token (illustrated by the red letters) to the next cell and let the model predict the full sequence. For each token $a$, we obtain a predicted sequence $\hat{y}_a$. Comparing it to the ground truth sequence $y$ yields the associated cost $c(a)$." -- searnn
  • Figure 2: NMT model architecture
  • Figure 3: BLEU score on train and test set for the three translation directions.