Segmentation-Free Streaming Machine Translation

Javier Iranzo-Sánchez; Jorge Iranzo-Sánchez; Adrià Giménez; Jorge Civera; Alfons Juan

Segmentation-Free Streaming Machine Translation

Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

TL;DR

A Segmentation-Free framework is proposed that enables the model to translate an unsegmented source stream by delaying the segmentation decision until after the translation has been generated and has better quality-latency trade-off than competing approaches that use an independent segmentation model.

Abstract

Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model. Software, data and models will be released upon paper acceptance.