Table of Contents
Fetching ...

Simultaneous Machine Translation with Large Language Models

Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari

TL;DR

This paper investigates using large language models (LLMs) for simultaneous machine translation (SimulMT) by converting an off-the-shelf LLM into a SimulMT agent via an incremental-read/write framework. It introduces a latency-reducing Relaxed Agreement Longest Common Prefix (RALCP) and a Read-$n$ incremental-decoding policy, enabling LLM-based SimulMT without additional training beyond lightweight fine-tuning with LoRA. Experiments on nine MUST-C language pairs with Llama2-7B-chat show that LLMs can surpass dedicated MT models in BLEU and LAAL under the same decoding policy, with notable robustness to noisy inputs and improved data-utilization efficiency when combined with prefix-based SFT. However, the approach incurs substantial computational cost, highlighting a crucial bottleneck for practical deployment and motivating future research into more adaptive policies and efficiency-oriented strategies.

Abstract

Real-world simultaneous machine translation (SimulMT) systems face more challenges than just the quality-latency trade-off. They also need to address issues related to robustness with noisy input, processing long contexts, and flexibility for knowledge injection. These challenges demand models with strong language understanding and generation capabilities which may not often equipped by dedicated MT models. In this paper, we investigate the possibility of applying Large Language Models (LLM) to SimulMT tasks by using existing incremental-decoding methods with a newly proposed RALCP algorithm for latency reduction. We conducted experiments using the \texttt{Llama2-7b-chat} model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics. Further analysis indicates that LLM has advantages in terms of tuning efficiency and robustness. However, it is important to note that the computational cost of LLM remains a significant obstacle to its application in SimulMT.

Simultaneous Machine Translation with Large Language Models

TL;DR

This paper investigates using large language models (LLMs) for simultaneous machine translation (SimulMT) by converting an off-the-shelf LLM into a SimulMT agent via an incremental-read/write framework. It introduces a latency-reducing Relaxed Agreement Longest Common Prefix (RALCP) and a Read- incremental-decoding policy, enabling LLM-based SimulMT without additional training beyond lightweight fine-tuning with LoRA. Experiments on nine MUST-C language pairs with Llama2-7B-chat show that LLMs can surpass dedicated MT models in BLEU and LAAL under the same decoding policy, with notable robustness to noisy inputs and improved data-utilization efficiency when combined with prefix-based SFT. However, the approach incurs substantial computational cost, highlighting a crucial bottleneck for practical deployment and motivating future research into more adaptive policies and efficiency-oriented strategies.

Abstract

Real-world simultaneous machine translation (SimulMT) systems face more challenges than just the quality-latency trade-off. They also need to address issues related to robustness with noisy input, processing long contexts, and flexibility for knowledge injection. These challenges demand models with strong language understanding and generation capabilities which may not often equipped by dedicated MT models. In this paper, we investigate the possibility of applying Large Language Models (LLM) to SimulMT tasks by using existing incremental-decoding methods with a newly proposed RALCP algorithm for latency reduction. We conducted experiments using the \texttt{Llama2-7b-chat} model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics. Further analysis indicates that LLM has advantages in terms of tuning efficiency and robustness. However, it is important to note that the computational cost of LLM remains a significant obstacle to its application in SimulMT.
Paper Structure (31 sections, 2 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 2 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: The illustration of the pipeline of our framework where the source texts are read from the streaming input buffer and incrementally added to the prompt. Target texts are written to the streaming output buffer and are also added to the prompt incrementally. RALCP denotes the Relaxed Agreement Longest Common Prefix algorithm proposed by us (§ \ref{['ralcp']}).
  • Figure 2: This example shows the scenario where the LCP algorithm fails to find a common prefix because of the difference of the first token, but RALCP successfully returns the prefix because of the relaxed constraints. For RALCP, words at the same position are annotated with the same color group, their votes are indicated by the darkness of the color. The selected prefix is annotated with gray background.
  • Figure 3: This figure illustrates how SimulMT performance (BLEU) is maintained (in %) with reduced data, in comparison to training on the full dataset (all): (i) one-shot, (ii) varying amount of training size from 1K to 100K and (iii) multilingual SFT on all data (multi-L). The legend shows the language pair and its coverage in Llama2 pretraining data.
  • Figure 4: The performance in BLEU and COMET of baseline methods and LLM with ground truth or ASR transcripts as input. (Averaging across 9 language pairs)
  • Figure 5: The average time of predicting one target token (in milliseconds) of baseline models and LLM under offline and simultaneous scenarios.
  • ...and 1 more figures