Table of Contents
Fetching ...

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

Zhixiang Lu, Chong Zhang, Chenyu Xue, Angelos Stefanidis, Chong Li, Jionglong Su, Zhengyong Jiang

Abstract

Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

Abstract

Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.

Paper Structure

This paper contains 37 sections, 14 equations, 3 figures, 6 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overview of the MERIT framework. The pipeline consists of (a) heuristic data selection utilizing the Elite Parallel Data Sampler (EPDS, \ref{['alg:epds']}) and Data Integrity Validation (DIV, \ref{['alg:div']}), (b) translation scoring via a QE agent trained on expert cross-reviewed quality data using GRPO-SAR, (c) reward-based data distillation, which utilizes the QE agent to rescore the entire training set and filter for high-quality translation data, and (d) final model optimization using SFT-LTP based on this distilled dataset.
  • Figure 2: Performance–Scale Trade-offs of MERIT-3B and Baseline Models on Chinese-Centric Multilingual Translation. Comparison of BLEU-chrF scores against model size (log-scale) across MERIT-3B, open-source, and estimated closed-source models.
  • Figure 3: Training loss and reward evolution across SFT and GRPO strategies.