Table of Contents
Fetching ...

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

TL;DR

<3-5 sentence high-level summary>

Abstract

While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark.

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

TL;DR

<3-5 sentence high-level summary>

Abstract

While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark.
Paper Structure (44 sections, 9 figures, 25 tables)

This paper contains 44 sections, 9 figures, 25 tables.

Figures (9)

  • Figure 1: Illustration of our method for building TowerBase and TowerInstruct.
  • Figure 2: Translation quality on Flores-200 and WMT23 for TowerInstruct models and a collection of open and close models across different scales. As the scale of GPT models is not known, we represent them with a horizontal line. TowerInstruct outperforms open alternatives --- even of larger scales --- and is competitive with GPT models.
  • Figure 3: Tasks included in our supervised finetuning dataset TowerBlocks.
  • Figure 4: Win rates margin of TowerInstruct-13B by length of the tokenized source for (a) en$\rightarrow$xx and (b) xx$\rightarrow$en language pairs for the WMT23 test set. We compare against GPT-4 ($\square$) and Alma-R ($\triangle$). We define a (sentence-level) win if the delta between two systems is superior to 1 Comet-22 point.
  • Figure 5: Comparison of NLLB 3B original translation quality (x-axis) with TowerInstruct 13B post edition quality (y-axis), and a concrete example (left). Each dot is a WMT 23 zh$\rightarrow$en translation. Marker size and hue represent the difference between post-edition and original translation qualities. The source and reference of the highlighted post edition are "对这个代理公司和亚马逊实在是很无语。" and "As it relates to this agency and Amazon, I am truly stunned.", respectively. Similar patterns hold on other LPs.
  • ...and 4 more figures