Table of Contents
Fetching ...

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, Dacheng Tao

TL;DR

The paper tackles off-target translation in zero-shot translation by LLMs, especially for low-resource languages. It introduces a two-stage fine-tuning framework: stage-1 pre-tunes on multilingual translation data with a maximum-likelihood objective to unlock translation ability, and stage-2 injects instruction-conflicting samples with an unlikelihood loss to enforce correct language-direction following. Empirical results on IWSLT and WMT across 16 zero-shot directions show large reductions in off-target translation and notable gains in BLEURT and SacreBLEU, while preserving supervised translation and general task performance. This approach offers a practical method to build translation-tailored LLMs with robust instruction-following for multilingual translation tasks, with code and models to be released.

Abstract

Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from developing accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

TL;DR

The paper tackles off-target translation in zero-shot translation by LLMs, especially for low-resource languages. It introduces a two-stage fine-tuning framework: stage-1 pre-tunes on multilingual translation data with a maximum-likelihood objective to unlock translation ability, and stage-2 injects instruction-conflicting samples with an unlikelihood loss to enforce correct language-direction following. Empirical results on IWSLT and WMT across 16 zero-shot directions show large reductions in off-target translation and notable gains in BLEURT and SacreBLEU, while preserving supervised translation and general task performance. This approach offers a practical method to build translation-tailored LLMs with robust instruction-following for multilingual translation tasks, with code and models to be released.

Abstract

Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from developing accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.
Paper Structure (27 sections, 3 equations, 4 figures, 5 tables)

This paper contains 27 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Off-target translation ratio (OTR % $\downarrow$) in zero-shot translation of WMT dataset. We present the comparison between LLaMA-MT, a LLaMA fine-tuned on the translation dataset, and our model.
  • Figure 2: Overview of our fine-tuning framework for zero-shot translation. (a) In the first stage, we perform pre-tuning on LLMs using MLE loss on multilingual translation samples, focusing on unlocking the translation ability of LLMs. (b) Subsequently, we introduce instruction-conflicting samples by randomly substituting the instruction component with a different one. We then train the model with MLE loss $\mathcal{L}^{MLE}$ on translation data and incorporate an unlikelihood loss $\mathcal{L}^{UL}$ on the instruction-conflicting samples.
  • Figure 3: Ablation Studies. a) Ablation study on continued training steps. b) Ablation study on the mixing hyper-parameter $\alpha$. This demonstrates the zero-shot translation performance following the second stage of training.
  • Figure 4: The impact of fine-tuning translation data size. We report the BLEURT and OTR Scores on the IWSLT dataset. The x-axis is the fine-tuning data size $n$.