Contrastive Feedback Mechanism for Simultaneous Speech Translation

Haotian Tan; Sakriani Sakti

Contrastive Feedback Mechanism for Simultaneous Speech Translation

Haotian Tan, Sakriani Sakti

TL;DR

The paper addresses the quality-latency trade-off in simultaneous speech translation (SST) and notes that unstable predictions from early chunks are typically ignored by existing decision policies. It introduces Contrastive Feedback Mechanism (CFM), which uses unstable predictions as feedback to refine subsequent chunk translations via a contrastive objective, formalized as $CFM$-$Score(P_c;P_f) = \\log P_c + \\mathrm{Contrast}(P_c;P_f)$ with $\\mathrm{Contrast}(P_c;P_f) = \\log rac{p_c(y_i|y_{<i})}{P_f}$ and a plausibility constraint $\\mathcal{V}_\\beta$ ($\\beta=0.1$). The method integrates with existing policies (AlignAtt, EDAtt, LA) by deriving feedback distributions $P_f$ from unstable predictions and rescoring current-chunk candidates accordingly. Empirical evaluation on MuST-C v1.0 across eight languages shows CFM consistently improves BLEU with negligible latency increases, achieving up to 2.05 BLEU-point gains (notably for LA on en→nl) and demonstrating the practicality of leveraging unstable predictions for real-time translation.

Abstract

Recent advances in simultaneous speech translation (SST) focus on the decision policies that enable the use of offline-trained ST models for simultaneous inference. These decision policies not only control the quality-latency trade-off in SST but also mitigate the impact of unstable predictions on translation quality by delaying translation for more context or discarding these predictions through stable hypothesis detection. However, these policies often overlook the potential benefits of utilizing unstable predictions. We introduce the contrastive feedback mechanism (CFM) for SST, a novel method that leverages these unstable predictions as feedback to improve translation quality. CFM guides the system to eliminate undesired model behaviors from these predictions through a contrastive objective. The experiments on 3 state-of-the-art decision policies across 8 languages in the MuST-C v1.0 dataset show that CFM effectively improves the performance of SST.

Contrastive Feedback Mechanism for Simultaneous Speech Translation

TL;DR

with

and a plausibility constraint

(

). The method integrates with existing policies (AlignAtt, EDAtt, LA) by deriving feedback distributions

from unstable predictions and rescoring current-chunk candidates accordingly. Empirical evaluation on MuST-C v1.0 across eight languages shows CFM consistently improves BLEU with negligible latency increases, achieving up to 2.05 BLEU-point gains (notably for LA on en→nl) and demonstrating the practicality of leveraging unstable predictions for real-time translation.

Abstract

Paper Structure (15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Contrastive Feedback Mechanism
CFM-Enhanced SST
Feedback Information
Experimental Setup
Data
Offline ST Model
Decision Policies
Evaluation
Experiments and Results
Offline Results
Chunk Size
Simultaneous performance of CFM
Conclusion
Acknowledgements

Figures (4)

Figure 1: Framework of the SST with CFM. Top: CFM leverages unstable predictions from an earlier chunk (marked as 1) as feedback to enhance the prediction of a subsequent chunk (marked as 2). Bottom: An English-German translation example with/without CFM. The word "light" can be translated as "heller" (illumination) or "leichter" (weight). CFM helps to filter out the undesired model behavior of translating "light" to "heller" inappropriately.
Figure 2: Offline translation quality comparison of different ST models.
Figure 3: Quality-latency trade-off of different chunk sizes combined with AlignAtt and EDAtt policies.
Figure 4: Maximum BLEU score improvements (shown in column charts) and their associated latency increases (depicted in line charts) correspond to bold values in Tables \ref{['table: 1']} and \ref{['table: 2']}.

Contrastive Feedback Mechanism for Simultaneous Speech Translation

TL;DR

Abstract

Contrastive Feedback Mechanism for Simultaneous Speech Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)