Table of Contents
Fetching ...

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

TL;DR

This work tackles domain adaptation for low-resource African language translation by reframing model selection as a contextual multi-armed bandit problem. A pool of MT models is treated as arms, with sentence-context features provided by a LaBSE-based encoder and rewards derived from domain-specific metrics (BLEU and COMET, or COMETKiwi when target translations are unavailable). The authors evaluate four bandit strategies—UCB, LinUCB, Neural LinUCB, and Thompson Sampling—across three languages (Igbo, Yoruba, Swahili) and three domains (News, Movies, Religious), showing that bandit-based selection can match or exceed the best single model while using far less data. Results demonstrate robust performance in both data-present and target-free settings, highlighting the approach’s practicality for data-constrained MT deployment. Overall, the paper provides a data-efficient, statistically grounded method for adaptive NMT model selection in domain-specific tasks.

Abstract

Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an optimal model for translation is crucial for achieving strong performance on in-domain data, particularly in scenarios where fine-tuning is not feasible or practical. In this paper, we investigate strategies for selecting the most suitable NMT model for a given domain using bandit-based algorithms, including Upper Confidence Bound, Linear UCB, Neural Linear Bandit, and Thompson Sampling. Our method effectively addresses the resource constraints by facilitating optimal model selection with high confidence. We evaluate the approach across three African languages and domains, demonstrating its robustness and effectiveness in both scenarios where target data is available and where it is absent.

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

TL;DR

This work tackles domain adaptation for low-resource African language translation by reframing model selection as a contextual multi-armed bandit problem. A pool of MT models is treated as arms, with sentence-context features provided by a LaBSE-based encoder and rewards derived from domain-specific metrics (BLEU and COMET, or COMETKiwi when target translations are unavailable). The authors evaluate four bandit strategies—UCB, LinUCB, Neural LinUCB, and Thompson Sampling—across three languages (Igbo, Yoruba, Swahili) and three domains (News, Movies, Religious), showing that bandit-based selection can match or exceed the best single model while using far less data. Results demonstrate robust performance in both data-present and target-free settings, highlighting the approach’s practicality for data-constrained MT deployment. Overall, the paper provides a data-efficient, statistically grounded method for adaptive NMT model selection in domain-specific tasks.

Abstract

Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an optimal model for translation is crucial for achieving strong performance on in-domain data, particularly in scenarios where fine-tuning is not feasible or practical. In this paper, we investigate strategies for selecting the most suitable NMT model for a given domain using bandit-based algorithms, including Upper Confidence Bound, Linear UCB, Neural Linear Bandit, and Thompson Sampling. Our method effectively addresses the resource constraints by facilitating optimal model selection with high confidence. We evaluate the approach across three African languages and domains, demonstrating its robustness and effectiveness in both scenarios where target data is available and where it is absent.

Paper Structure

This paper contains 6 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Motivation for Reinforcement Learning for model selection in machine translation: (a) Using a large dataset for training may be inefficient or impractical for low-resource settings, (b) BLEU scores vary significantly across domains, making model selection unreliable, (c) Reinforcement learning enables efficient model selection with fewer data and statistical significance.
  • Figure 2: Block diagram of the proposed bandit-based model selection strategy.