Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

Victor Agostinelli; Max Wild; Matthew Raffel; Kazi Ahmed Asif Fuad; Lizhong Chen

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

Victor Agostinelli, Max Wild, Matthew Raffel, Kazi Ahmed Asif Fuad, Lizhong Chen

TL;DR

Simultaneous translation with large language models presents unique challenges due to partial source contexts and dynamic prompts. The authors introduce Simul-LLM, an open-source PyTorch-based framework that enables fine-tuning and evaluation of SimulMT LLMs and compare strategies that adapt NMT LLMs and SimulMT-specific prompting. They propose two prompt structures to bridge the fine-tuning/inference gap and investigate the impact of wait-$k$ curricula, finding that higher wait-$k$ during fine-tuning can improve generalizability, though results are language-pair dependent. The work provides a reproducible platform for systematic study of LLM-based SimulMT and highlights both potential translation quality gains and remaining challenges in generalizability and efficiency.

Abstract

Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine translation (NMT) is one such task that LLMs have been applied to with great success. However, little research has focused on applying LLMs to the more difficult subset of NMT called simultaneous translation (SimulMT), where translation begins before the entire source context is available to the model. In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

TL;DR

curricula, finding that higher wait-

during fine-tuning can improve generalizability, though results are language-pair dependent. The work provides a reproducible platform for systematic study of LLM-based SimulMT and highlights both potential translation quality gains and remaining challenges in generalizability and efficiency.

Abstract

Paper Structure (30 sections, 2 equations, 3 figures, 5 tables)

This paper contains 30 sections, 2 equations, 3 figures, 5 tables.

Introduction
Background and Motivation
Large Language Models for Neural Machine Translation
Simultaneous Translation
Motivation for Applying LLMs to SimulMT
Simul-LLM: an Open-Source SimulMT LLM Fine-tuning Framework
Fine-tuning Wrapper and Features
Evaluation Agent and Features
Adapting NMT LLMs to SimulMT
Prompt Structure for SimulMT LLMs
Split Source-Target Prompt Structure
Single Output Word Prompt Structure
Evaluation Methodology
Dataset Selection and Preprocessing
Word or Token-Based Wait-k for LLMs
...and 15 more sections

Figures (3)

Figure 1: Depiction of the Simul-LLM fine-tuning wrapper framework. High level specifications and hyperparameters are passed to the wrapper on instantiation, which employs a specified prompt constructor, instantiates a specified LLM foundational model, optionally constructs a PEFT config, and optionally constructs a quantization config via .
Figure 2: Depiction of the Simul-LLM evaluation agent framework. The SimulMT agent receives the incremental source from SimulEval (left of the figure) and sends the finalized translation step hypothesis to SimulEval (right of the figure), which manages lagging/latency calculation and translation quality scoring.
Figure 3: Example of English to Spanish translation prompt construction with an incremental source x and an incremental output y applied via our proposed expanded dataset. Without more complex loss filtering than is typical, the entire output sequence for the split source-target prompt structure would be scored and the model would learn for wait-k schedules ranging from wait-i to wait-k as opposed to just wait-k.

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

TL;DR

Abstract

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)