Table of Contents
Fetching ...

ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation

Shaojie Dai, Xin Liu, Ping Luo, Yue Yu

TL;DR

This work targets off-target failures in multilingual neural machine translation when using large language models by introducing ACT-MNMT, a supervised fine-tuning approach that constrains outputs via trigger tokens on the target side. It develops two methods: a hard-constrained template (TECT-MNMT) and a soft, trigger-token based auto-constriction (ACT-MNMT), both designed to reduce instruction misunderstanding, wrong-language generation, source-copy, and length inconsistencies. Across eight language pairs and multiple WMT directions, ACT-MNMT and TECT-MNMT outperform prompt-tuning baselines, with ACT-MNMT offering robust performance and substantial reductions in off-target metrics across model sizes. The study demonstrates strong data scalability, model-size robustness, and ablations that highlight the value of task-descriptive and direction-specific trigger information. Overall, the proposed constrained-turning framework provides a practical, orthogonal route to improving MNMT with LLMs, potentially extendable to autoregressive models.

Abstract

Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning. However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translation with wrong language and over-generation. For this issue, this paper introduces an \textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction \textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual \textbf{\underline{N}}eural \textbf{\underline{M}}achine \textbf{\underline{T}}ranslation (\model), which is a novel supervised fine-tuning mechanism and orthogonal to the traditional prompt-based methods. In this method, \model automatically constructs a constrained template in the target side by adding trigger tokens ahead of the ground truth. Furthermore, trigger tokens can be arranged and combined freely to represent different task semantics, and they can be iteratively updated to maximize the label likelihood. Experiments are performed on WMT test sets with multiple metrics, and the experimental results demonstrate that \model achieves substantially improved performance across multiple translation directions and reduce the off-target phenomena in the translation.

ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation

TL;DR

This work targets off-target failures in multilingual neural machine translation when using large language models by introducing ACT-MNMT, a supervised fine-tuning approach that constrains outputs via trigger tokens on the target side. It develops two methods: a hard-constrained template (TECT-MNMT) and a soft, trigger-token based auto-constriction (ACT-MNMT), both designed to reduce instruction misunderstanding, wrong-language generation, source-copy, and length inconsistencies. Across eight language pairs and multiple WMT directions, ACT-MNMT and TECT-MNMT outperform prompt-tuning baselines, with ACT-MNMT offering robust performance and substantial reductions in off-target metrics across model sizes. The study demonstrates strong data scalability, model-size robustness, and ablations that highlight the value of task-descriptive and direction-specific trigger information. Overall, the proposed constrained-turning framework provides a practical, orthogonal route to improving MNMT with LLMs, potentially extendable to autoregressive models.

Abstract

Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning. However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translation with wrong language and over-generation. For this issue, this paper introduces an \textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction \textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual \textbf{\underline{N}}eural \textbf{\underline{M}}achine \textbf{\underline{T}}ranslation (\model), which is a novel supervised fine-tuning mechanism and orthogonal to the traditional prompt-based methods. In this method, \model automatically constructs a constrained template in the target side by adding trigger tokens ahead of the ground truth. Furthermore, trigger tokens can be arranged and combined freely to represent different task semantics, and they can be iteratively updated to maximize the label likelihood. Experiments are performed on WMT test sets with multiple metrics, and the experimental results demonstrate that \model achieves substantially improved performance across multiple translation directions and reduce the off-target phenomena in the translation.
Paper Structure (20 sections, 3 equations, 5 figures, 7 tables)

This paper contains 20 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Off-target ratio on IWSLT 2017 test datasets (to evaluate the off-target ratio between any pair of languages, we extract a set of 488 sentences from the IWSLT 2017 cettolo2017overview test dataset that have identical English content, and each sentence has 10 different translations in various languages).
  • Figure 2: An overview of ACT-MNMT applied to Multilingual Neural Machine Translation (the number of common trigger token for each translation direction is 1, and the number of specific trigger token is 2 in this example).
  • Figure 3: Parameter sensitivity w.r.t model size.
  • Figure 4: Parameter sensitivity w.r.t number of DE-EN language pair for training.
  • Figure 5: Over/Under-generation ratio on IWSLT 2017 test datasets.