Table of Contents
Fetching ...

On-the-Fly Fusion of Large Language Models and Machine Translation

Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt

TL;DR

This work addresses improving translation by on-the-fly fusion of a dedicated MT model with a prompted LLM, leveraging the complementary strengths of parallel-data-trained MT and monolingually trained LLMs. The method blends token probabilities via $p_{ ext{ensemble}}(t_i) = \lambda p_{ ext{MT}}(t_i) + (1 - \lambda) p_{ ext{LLM}}(t_i)$, with prompts encoding domain and document context to induce translation-aware behavior in the LLM. Empirical results across four language directions show that a weaker-at-translation LLM can still boost MT performance, and that the MT+LLM ensemble can outperform ensembles of two MT models in many settings, especially when document context is used. The study highlights the practical value of translation-specific prompting and document-level information for cross-linguistic translation, while also noting limitations such as domain variability, resource constraints, and sensitivity to the LLM used.

Abstract

We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. We perform experiments on 4 language pairs (both directions) with varying data amounts. We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models. We combine our method with various techniques from LLM prompting, such as in context learning and translation context.

On-the-Fly Fusion of Large Language Models and Machine Translation

TL;DR

This work addresses improving translation by on-the-fly fusion of a dedicated MT model with a prompted LLM, leveraging the complementary strengths of parallel-data-trained MT and monolingually trained LLMs. The method blends token probabilities via , with prompts encoding domain and document context to induce translation-aware behavior in the LLM. Empirical results across four language directions show that a weaker-at-translation LLM can still boost MT performance, and that the MT+LLM ensemble can outperform ensembles of two MT models in many settings, especially when document context is used. The study highlights the practical value of translation-specific prompting and document-level information for cross-linguistic translation, while also noting limitations such as domain variability, resource constraints, and sensitivity to the LLM used.

Abstract

We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. We perform experiments on 4 language pairs (both directions) with varying data amounts. We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models. We combine our method with various techniques from LLM prompting, such as in context learning and translation context.
Paper Structure (23 sections, 3 equations, 8 figures, 6 tables)

This paper contains 23 sections, 3 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: TED-100 translation quality for various number of prompt examples (for few short learning or past context). Prompting with context outperforms few shot prompting, and it performs best when ensembled.
  • Figure 2: Using an LLM with a translation prompt and without any prompting (ru-en). Unprompted the ensemble is strictly worse than the MT baseline (mixing ratio $\lambda=1$).
  • Figure 3: Baseline translation prompt.
  • Figure 4: Translation prompt with domain.
  • Figure 5: n-shot translation prompt.
  • ...and 3 more figures