Table of Contents
Fetching ...

PsFuture: A Pseudo-Future-based Zero-Shot Adaptive Policy for Simultaneous Machine Translation

Libo Zhao, Jing Li, Ziqian Zeng

TL;DR

PsFuture, the first zero-shot adaptive read/write policy for SiMT, enabling the translation model to independently determine read/write actions without the necessity for additional training is proposed, and a novel training strategy, Prefix-to-Full (P2F), is introduced.

Abstract

Simultaneous Machine Translation (SiMT) requires target tokens to be generated in real-time as streaming source tokens are consumed. Traditional approaches to SiMT typically require sophisticated architectures and extensive parameter configurations for training adaptive read/write policies, which in turn demand considerable computational power and memory. We propose PsFuture, the first zero-shot adaptive read/write policy for SiMT, enabling the translation model to independently determine read/write actions without the necessity for additional training. Furthermore, we introduce a novel training strategy, Prefix-to-Full (P2F), specifically tailored to adjust offline translation models for SiMT applications, exploiting the advantages of the bidirectional attention mechanism inherent in offline models. Experiments across multiple benchmarks demonstrate that our zero-shot policy attains performance on par with strong baselines and the P2F method can further enhance performance, achieving an outstanding trade-off between translation quality and latency.

PsFuture: A Pseudo-Future-based Zero-Shot Adaptive Policy for Simultaneous Machine Translation

TL;DR

PsFuture, the first zero-shot adaptive read/write policy for SiMT, enabling the translation model to independently determine read/write actions without the necessity for additional training is proposed, and a novel training strategy, Prefix-to-Full (P2F), is introduced.

Abstract

Simultaneous Machine Translation (SiMT) requires target tokens to be generated in real-time as streaming source tokens are consumed. Traditional approaches to SiMT typically require sophisticated architectures and extensive parameter configurations for training adaptive read/write policies, which in turn demand considerable computational power and memory. We propose PsFuture, the first zero-shot adaptive read/write policy for SiMT, enabling the translation model to independently determine read/write actions without the necessity for additional training. Furthermore, we introduce a novel training strategy, Prefix-to-Full (P2F), specifically tailored to adjust offline translation models for SiMT applications, exploiting the advantages of the bidirectional attention mechanism inherent in offline models. Experiments across multiple benchmarks demonstrate that our zero-shot policy attains performance on par with strong baselines and the P2F method can further enhance performance, achieving an outstanding trade-off between translation quality and latency.
Paper Structure (22 sections, 7 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: An Zh$\to$En example demonstrating an ideal timing for predicting the next token "to". Even when provided with additional possible future information, the probability distribution of the predicted next token does not change significantly, remaining dominated by the token "to". Therefore, based on the current source prefix "我想吃" and the current target prefix "I want," a write operation can be executed to predict the next token as "to".
  • Figure 2: An overall schematic of the PsFuture policy. Based on the current source prefixes ${(x_1, x_2)}$, target prefixes ${(y_1, y_2)}$, and pseudo future information ${(x_3, x_4)}$ (tokens highlighted in red), the simultaneous translation model can directly perform adaptive read/write decisions.
  • Figure 3: Example of a Zh$\to$En divergence matrix $\mathbf{D}$, where $\mathbf{D}_{t,g(t)}=\mathbf{D}\left( \mathbf{p}^{\textnormal{part}}_t,\mathbf{p}^{\textnormal{pseudo}}_t \right)$. The red elements in the matrix denote a potential read/write path, determined by a predefined threshold $\lambda$ (0.2 in this case).
  • Figure 4: Comparison of BLUE vs. AL curves between multi-path (abbreviated as Mp) wait-k, ITST, DaP-SiMT, and our proposed PsFuture approach on three language pairs. PsFuture-W and PsFuture-O denote the multi-path wait-$k$ model based PsFuture method and the offline model (P2F-enhanced) based PsFuture method, respectively.
  • Figure 5: Effect of the pseudo-future suffix
  • ...and 5 more figures