Table of Contents
Fetching ...

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu

TL;DR

MT-Ladder is developed, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT, trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost.

Abstract

General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedented scale of computing and data or substantial human editing and annotation efforts. In this paper, we develop MT-Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT. MT-Ladder is trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost. During training, we propose a hierarchical fine-tuning strategy with an easy-to-hard schema, improving MT-Ladder's refining performance progressively. The trained MT-Ladder can be seamlessly integrated with any general-purpose LLMs to boost their translation performance. By utilizing Gemma-2B/7B as the backbone, MT-Ladder-2B can elevate raw translations to the level of top-tier open-source models (e.g., refining BigTranslate-13B with +6.91 BLEU and +3.52 COMET for XX-En), and MT-Ladder-7B can further enhance model performance to be on par with the state-of-the-art GPT-4. Extensive ablation and analysis corroborate the effectiveness of MT-Ladder in diverse settings. Our code is available at https://github.com/fzp0424/MT-Ladder

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

TL;DR

MT-Ladder is developed, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT, trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost.

Abstract

General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedented scale of computing and data or substantial human editing and annotation efforts. In this paper, we develop MT-Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT. MT-Ladder is trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost. During training, we propose a hierarchical fine-tuning strategy with an easy-to-hard schema, improving MT-Ladder's refining performance progressively. The trained MT-Ladder can be seamlessly integrated with any general-purpose LLMs to boost their translation performance. By utilizing Gemma-2B/7B as the backbone, MT-Ladder-2B can elevate raw translations to the level of top-tier open-source models (e.g., refining BigTranslate-13B with +6.91 BLEU and +3.52 COMET for XX-En), and MT-Ladder-7B can further enhance model performance to be on par with the state-of-the-art GPT-4. Extensive ablation and analysis corroborate the effectiveness of MT-Ladder in diverse settings. Our code is available at https://github.com/fzp0424/MT-Ladder
Paper Structure (22 sections, 1 equation, 16 figures, 7 tables)

This paper contains 22 sections, 1 equation, 16 figures, 7 tables.

Figures (16)

  • Figure 1: The average translation quality improvements across 8 translation directions on WMT22 test set (Zh$\leftrightarrow$En, De$\leftrightarrow$En, En$\leftrightarrow$Ru, En$\leftrightarrow$Cs) using MT-Ladder-2B or 7B. The metric scores are calculated by COMET-22 (wmt22-comet-da) rei2020comet.
  • Figure 2: Obtain MT-Ladder in two steps: a) Sample from an LLM using the parallel corpus to create pseudo-refinement triplet training data. b) Use a hierarchical fine-tuning method with an easy-to-hard schema to tune the base model and obtain MT-Ladder. MT-Ladder can refine models with significantly higher parameter counts than the sampling LLM and base model. It can enhance original translations from various sources to the next level.
  • Figure 3: Prompts used: [source language] and [target language] represent the full names of the languages. [source sentence] is the sentence to be translated. [intermediate translation] is the sampled translation. For Direction Translation, we follow xu2023paradigm.
  • Figure 4: Comparison of original translation quality (x-axis) with MT-Ladder-7B refined quality (y-axis). Each dot is a WMT22 En-Zh translation. The percentages represent the proportion of each part, attached next to the markers.
  • Figure 5: Trends in BLEU and COMET during training. HFT represents our hierarchical fine-tuning from Easy to Hard examples, while Mixed denotes using mixed data shuffling without hierarchical fine-tuning. Anti-HFT refers to reversing the HFT process.
  • ...and 11 more figures