Table of Contents
Fetching ...

Contrastive Feedback Mechanism for Simultaneous Speech Translation

Haotian Tan, Sakriani Sakti

TL;DR

The paper addresses the quality-latency trade-off in simultaneous speech translation (SST) and notes that unstable predictions from early chunks are typically ignored by existing decision policies. It introduces Contrastive Feedback Mechanism (CFM), which uses unstable predictions as feedback to refine subsequent chunk translations via a contrastive objective, formalized as $CFM$-$Score(P_c;P_f) = \\log P_c + \\mathrm{Contrast}(P_c;P_f)$ with $\\mathrm{Contrast}(P_c;P_f) = \\log rac{p_c(y_i|y_{<i})}{P_f}$ and a plausibility constraint $\\mathcal{V}_\\beta$ ($\\beta=0.1$). The method integrates with existing policies (AlignAtt, EDAtt, LA) by deriving feedback distributions $P_f$ from unstable predictions and rescoring current-chunk candidates accordingly. Empirical evaluation on MuST-C v1.0 across eight languages shows CFM consistently improves BLEU with negligible latency increases, achieving up to 2.05 BLEU-point gains (notably for LA on en→nl) and demonstrating the practicality of leveraging unstable predictions for real-time translation.

Abstract

Recent advances in simultaneous speech translation (SST) focus on the decision policies that enable the use of offline-trained ST models for simultaneous inference. These decision policies not only control the quality-latency trade-off in SST but also mitigate the impact of unstable predictions on translation quality by delaying translation for more context or discarding these predictions through stable hypothesis detection. However, these policies often overlook the potential benefits of utilizing unstable predictions. We introduce the contrastive feedback mechanism (CFM) for SST, a novel method that leverages these unstable predictions as feedback to improve translation quality. CFM guides the system to eliminate undesired model behaviors from these predictions through a contrastive objective. The experiments on 3 state-of-the-art decision policies across 8 languages in the MuST-C v1.0 dataset show that CFM effectively improves the performance of SST.

Contrastive Feedback Mechanism for Simultaneous Speech Translation

TL;DR

The paper addresses the quality-latency trade-off in simultaneous speech translation (SST) and notes that unstable predictions from early chunks are typically ignored by existing decision policies. It introduces Contrastive Feedback Mechanism (CFM), which uses unstable predictions as feedback to refine subsequent chunk translations via a contrastive objective, formalized as - with and a plausibility constraint (). The method integrates with existing policies (AlignAtt, EDAtt, LA) by deriving feedback distributions from unstable predictions and rescoring current-chunk candidates accordingly. Empirical evaluation on MuST-C v1.0 across eight languages shows CFM consistently improves BLEU with negligible latency increases, achieving up to 2.05 BLEU-point gains (notably for LA on en→nl) and demonstrating the practicality of leveraging unstable predictions for real-time translation.

Abstract

Recent advances in simultaneous speech translation (SST) focus on the decision policies that enable the use of offline-trained ST models for simultaneous inference. These decision policies not only control the quality-latency trade-off in SST but also mitigate the impact of unstable predictions on translation quality by delaying translation for more context or discarding these predictions through stable hypothesis detection. However, these policies often overlook the potential benefits of utilizing unstable predictions. We introduce the contrastive feedback mechanism (CFM) for SST, a novel method that leverages these unstable predictions as feedback to improve translation quality. CFM guides the system to eliminate undesired model behaviors from these predictions through a contrastive objective. The experiments on 3 state-of-the-art decision policies across 8 languages in the MuST-C v1.0 dataset show that CFM effectively improves the performance of SST.
Paper Structure (15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Framework of the SST with CFM. Top: CFM leverages unstable predictions from an earlier chunk (marked as 1) as feedback to enhance the prediction of a subsequent chunk (marked as 2). Bottom: An English-German translation example with/without CFM. The word "light" can be translated as "heller" (illumination) or "leichter" (weight). CFM helps to filter out the undesired model behavior of translating "light" to "heller" inappropriately.
  • Figure 2: Offline translation quality comparison of different ST models.
  • Figure 3: Quality-latency trade-off of different chunk sizes combined with AlignAtt and EDAtt policies.
  • Figure 4: Maximum BLEU score improvements (shown in column charts) and their associated latency increases (depicted in line charts) correspond to bold values in Tables \ref{['table: 1']} and \ref{['table: 2']}.