Self-Modifying State Modeling for Simultaneous Machine Translation
Donglei Yu, Xiaomian Kang, Yuchen Liu, Yu Zhou, Chengqing Zong
TL;DR
This work tackles the read/write decision problem in simultaneous machine translation by introducing Self-Modifying State Modeling ($SM^2$), a training paradigm that optimizes decisions at every state without constructing full decision paths. A confidence-based Self-Modifying process estimates state credibility by comparing SiMT predictions to an offline MT baseline, while Prefix Sampling ensures exhaustive exploration of potential states. The approach yields state-wise policy optimization, improves alignment-rich reads, and remains compatible with bidirectional encoders, enabling offline MT models to acquire SiMT capability via fine-tuning. Empirical results across ZhEn, DeEn, and EnRo show superior translation quality and policy learning compared to strong baselines, with robust performance under varying latency. The method also demonstrates that increasing exploration and state-wise independence can enhance both learning efficiency and end-task performance. $SM^2$ thus offers a principled, path-free framework that benefits from bidirectional encoders and supports offline-to-SiMT adaptation in practical settings.
Abstract
Simultaneous Machine Translation (SiMT) generates target outputs while receiving stream source inputs and requires a read/write policy to decide whether to wait for the next source token or generate a new target token, whose decisions form a \textit{decision path}. Existing SiMT methods, which learn the policy by exploring various decision paths in training, face inherent limitations. These methods not only fail to precisely optimize the policy due to the inability to accurately assess the individual impact of each decision on SiMT performance, but also cannot sufficiently explore all potential paths because of their vast number. Besides, building decision paths requires unidirectional encoders to simulate streaming source inputs, which impairs the translation quality of SiMT models. To solve these issues, we propose \textbf{S}elf-\textbf{M}odifying \textbf{S}tate \textbf{M}odeling (SM$^2$), a novel training paradigm for SiMT task. Without building decision paths, SM$^2$ individually optimizes decisions at each state during training. To precisely optimize the policy, SM$^2$ introduces Self-Modifying process to independently assess and adjust decisions at each state. For sufficient exploration, SM$^2$ proposes Prefix Sampling to efficiently traverse all potential states. Moreover, SM$^2$ ensures compatibility with bidirectional encoders, thus achieving higher translation quality. Experiments show that SM$^2$ outperforms strong baselines. Furthermore, SM$^2$ allows offline machine translation models to acquire SiMT ability with fine-tuning.
