Table of Contents
Fetching ...

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Jiahuan Li, Hao Zhou, Shujian Huang, Shanbo Cheng, Jiajun Chen

TL;DR

This work introduces Multilingual Finetuning with Translation Instructions (mFTI), a method to elicit translation ability in multilingual LLMs by training them to follow explicit translation instructions rather than relying on in-context demonstrations. Using XGLM-7.5B and datasets such as WikiMatrix and FLORES-101, the authors demonstrate that mFTI can outperform 8-shot ICL across 156 language directions and that performance scales with model size and data quality, while benefiting from pivot language information to improve cross-language alignment. However, mFTI still lags behind the strongest supervised MT systems, and several error types (notably OT and OH) persist, highlighting remaining gaps in direct language alignment and instruction following. The work further shows that increasing the diversity of language pairs and incorporating monolingual generation instructions can reduce some errors and enhance generalization, offering practical paths to more robust zero-shot translation capabilities in multilingual LLMs.

Abstract

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

TL;DR

This work introduces Multilingual Finetuning with Translation Instructions (mFTI), a method to elicit translation ability in multilingual LLMs by training them to follow explicit translation instructions rather than relying on in-context demonstrations. Using XGLM-7.5B and datasets such as WikiMatrix and FLORES-101, the authors demonstrate that mFTI can outperform 8-shot ICL across 156 language directions and that performance scales with model size and data quality, while benefiting from pivot language information to improve cross-language alignment. However, mFTI still lags behind the strongest supervised MT systems, and several error types (notably OT and OH) persist, highlighting remaining gaps in direct language alignment and instruction following. The work further shows that increasing the diversity of language pairs and incorporating monolingual generation instructions can reduce some errors and enhance generalization, offering practical paths to more robust zero-shot translation capabilities in multilingual LLMs.

Abstract

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.
Paper Structure (45 sections, 2 equations, 7 figures, 8 tables)

This paper contains 45 sections, 2 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Translation performance of 8-shot ICL and mFTI using 1000 sentences per language pair. Languages are ordered by the data amount in the pretraining corpus.
  • Figure 2: Comparison of mFTI with conventional supervised machine translation models. Performances are evaluated in BLEU.
  • Figure 3: Changes of BLEU score after pivoting through English for 8-shot ICL and mFTI.
  • Figure 4: The translation performance of finetuned XGLM as the number of model parameters and training examples scales.
  • Figure 5: Translation performance on different partitions as the number of language pairs grows. Left: partitions where sentences of both source and target language are seen when training. Right: partitions where source and/or target language sentences are unseen when training.
  • ...and 2 more figures