Simultaneous Machine Translation with Large Language Models
Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari
TL;DR
This paper investigates using large language models (LLMs) for simultaneous machine translation (SimulMT) by converting an off-the-shelf LLM into a SimulMT agent via an incremental-read/write framework. It introduces a latency-reducing Relaxed Agreement Longest Common Prefix (RALCP) and a Read-$n$ incremental-decoding policy, enabling LLM-based SimulMT without additional training beyond lightweight fine-tuning with LoRA. Experiments on nine MUST-C language pairs with Llama2-7B-chat show that LLMs can surpass dedicated MT models in BLEU and LAAL under the same decoding policy, with notable robustness to noisy inputs and improved data-utilization efficiency when combined with prefix-based SFT. However, the approach incurs substantial computational cost, highlighting a crucial bottleneck for practical deployment and motivating future research into more adaptive policies and efficiency-oriented strategies.
Abstract
Real-world simultaneous machine translation (SimulMT) systems face more challenges than just the quality-latency trade-off. They also need to address issues related to robustness with noisy input, processing long contexts, and flexibility for knowledge injection. These challenges demand models with strong language understanding and generation capabilities which may not often equipped by dedicated MT models. In this paper, we investigate the possibility of applying Large Language Models (LLM) to SimulMT tasks by using existing incremental-decoding methods with a newly proposed RALCP algorithm for latency reduction. We conducted experiments using the \texttt{Llama2-7b-chat} model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics. Further analysis indicates that LLM has advantages in terms of tuning efficiency and robustness. However, it is important to note that the computational cost of LLM remains a significant obstacle to its application in SimulMT.
