Self-Modifying State Modeling for Simultaneous Machine Translation

Donglei Yu; Xiaomian Kang; Yuchen Liu; Yu Zhou; Chengqing Zong

Self-Modifying State Modeling for Simultaneous Machine Translation

Donglei Yu, Xiaomian Kang, Yuchen Liu, Yu Zhou, Chengqing Zong

TL;DR

This work tackles the read/write decision problem in simultaneous machine translation by introducing Self-Modifying State Modeling ($SM^2$), a training paradigm that optimizes decisions at every state without constructing full decision paths. A confidence-based Self-Modifying process estimates state credibility by comparing SiMT predictions to an offline MT baseline, while Prefix Sampling ensures exhaustive exploration of potential states. The approach yields state-wise policy optimization, improves alignment-rich reads, and remains compatible with bidirectional encoders, enabling offline MT models to acquire SiMT capability via fine-tuning. Empirical results across ZhEn, DeEn, and EnRo show superior translation quality and policy learning compared to strong baselines, with robust performance under varying latency. The method also demonstrates that increasing exploration and state-wise independence can enhance both learning efficiency and end-task performance. $SM^2$ thus offers a principled, path-free framework that benefits from bidirectional encoders and supports offline-to-SiMT adaptation in practical settings.

Abstract

Simultaneous Machine Translation (SiMT) generates target outputs while receiving stream source inputs and requires a read/write policy to decide whether to wait for the next source token or generate a new target token, whose decisions form a \textit{decision path}. Existing SiMT methods, which learn the policy by exploring various decision paths in training, face inherent limitations. These methods not only fail to precisely optimize the policy due to the inability to accurately assess the individual impact of each decision on SiMT performance, but also cannot sufficiently explore all potential paths because of their vast number. Besides, building decision paths requires unidirectional encoders to simulate streaming source inputs, which impairs the translation quality of SiMT models. To solve these issues, we propose \textbf{S}elf-\textbf{M}odifying \textbf{S}tate \textbf{M}odeling (SM$^2$), a novel training paradigm for SiMT task. Without building decision paths, SM$^2$ individually optimizes decisions at each state during training. To precisely optimize the policy, SM$^2$ introduces Self-Modifying process to independently assess and adjust decisions at each state. For sufficient exploration, SM$^2$ proposes Prefix Sampling to efficiently traverse all potential states. Moreover, SM$^2$ ensures compatibility with bidirectional encoders, thus achieving higher translation quality. Experiments show that SM$^2$ outperforms strong baselines. Furthermore, SM$^2$ allows offline machine translation models to acquire SiMT ability with fine-tuning.

Self-Modifying State Modeling for Simultaneous Machine Translation

TL;DR

This work tackles the read/write decision problem in simultaneous machine translation by introducing Self-Modifying State Modeling (

), a training paradigm that optimizes decisions at every state without constructing full decision paths. A confidence-based Self-Modifying process estimates state credibility by comparing SiMT predictions to an offline MT baseline, while Prefix Sampling ensures exhaustive exploration of potential states. The approach yields state-wise policy optimization, improves alignment-rich reads, and remains compatible with bidirectional encoders, enabling offline MT models to acquire SiMT capability via fine-tuning. Empirical results across ZhEn, DeEn, and EnRo show superior translation quality and policy learning compared to strong baselines, with robust performance under varying latency. The method also demonstrates that increasing exploration and state-wise independence can enhance both learning efficiency and end-task performance.

thus offers a principled, path-free framework that benefits from bidirectional encoders and supports offline-to-SiMT adaptation in practical settings.

Abstract

), a novel training paradigm for SiMT task. Without building decision paths, SM

individually optimizes decisions at each state during training. To precisely optimize the policy, SM

introduces Self-Modifying process to independently assess and adjust decisions at each state. For sufficient exploration, SM

proposes Prefix Sampling to efficiently traverse all potential states. Moreover, SM

ensures compatibility with bidirectional encoders, thus achieving higher translation quality. Experiments show that SM

outperforms strong baselines. Furthermore, SM

allows offline machine translation models to acquire SiMT ability with fine-tuning.

Paper Structure (27 sections, 17 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 27 sections, 17 equations, 13 figures, 9 tables, 1 algorithm.

Introduction
Background
The Proposed Method
Self-Modifying for Confidence Estimation
Prefix Sampling
Confidence-based Policy in Inference
Experiments
Datasets
System Settings
Evaluation Metric
Results and Analysis
Simultaneous Translation Quality
Superiority of SM$^2$ in Learning Policy
Precise Optimization for Each Decision
Advantage of Sufficient Exploration
...and 12 more sections

Figures (13)

Figure 1: Illustration of different paradigms. (a) Training paradigm based on decision paths. All decisions along a path are optimized in an integrated manner. (b) Self-Modifying State Modeling. The decisions at each state are optimized individually.
Figure 2: Overview of SM$^2$. $S_j$ contains the states where $\mathbf{x}_{\leq j}$ is received. $S_M$ contains the states where complete $\mathbf{x}$ is received. We introduce a confidence net (ConfNet) to estimate the confidence of each state. The model parameters in SiMT setting and OMT setting are shared. In this figure, the sentence lengths of the source and target sides are set to $M=5$ and $N=4$ respectively, and $j=3$ in the Prefix Sampling step.
Figure 3: SacreBLEU against Average Lagging (AL) on Zh$\rightarrow$En, De$\rightarrow$En and En$\rightarrow$Ro.
Figure 4: COMET against Average Lagging (AL) on Zh$\rightarrow$En, De$\rightarrow$En and En$\rightarrow$Ro.
Figure 5: Evaluation of different SiMT policies. We calculate SA ($\uparrow$) under different latency levels.
...and 8 more figures

Self-Modifying State Modeling for Simultaneous Machine Translation

TL;DR

Abstract

Self-Modifying State Modeling for Simultaneous Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)