Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

Zhanglin Wu; Daimeng Wei; Zongyao Li; Hengchao Shang; Jiaxin Guo; Shaojun Li; Zhiqiang Rao; Yuanchang Luo; Ning Xie; Hao Yang

Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

Zhanglin Wu, Daimeng Wei, Zongyao Li, Hengchao Shang, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Ning Xie, Hao Yang

TL;DR

By using Minimum Bayesian risk (MBR) decoding to select the final translation from multiple hypotheses for NMT and LLM-based MT models, the submission of Huawei Translate Services Center (HW-TSC) receives competitive results in the final evaluation.

Abstract

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT24 general machine translation (MT) shared task, where we participate in the English to Chinese (en2zh) language pair. Similar to previous years' work, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated training, curriculum learning, and transductive ensemble learning to train the neural machine translation (NMT) model based on the deep Transformer-big architecture. The difference is that we also use continue pre-training, supervised fine-tuning, and contrastive preference optimization to train the large language model (LLM) based MT model. By using Minimum Bayesian risk (MBR) decoding to select the final translation from multiple hypotheses for NMT and LLM-based MT models, our submission receives competitive results in the final evaluation.

Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 4 figures, 2 tables)

This paper contains 26 sections, 1 equation, 4 figures, 2 tables.

Introduction
Data
Data Source
NMT Data Pre-processing
LLM-based MT Data Pre-processing
NMT System
System Overview
Regularized Dropout
Bidirectional Training
Data Diversification
Forward Translation
Back Translation
Alternated Training
Curriculum Learning
Transductive Ensemble Learning
...and 11 more sections

Figures (4)

Figure 1: CPT, SFT and CPO data templates used for LLM-based MT training.
Figure 2: The overall training flow of NMT system.
Figure 3: The training flow of LLM-based MT system.
Figure 4: Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding.

Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

TL;DR

Abstract

Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task

Authors

TL;DR

Abstract

Table of Contents

Figures (4)