SimulTron: On-Device Simultaneous Speech to Speech Translation
Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich
TL;DR
SimulTron targets real-time, on-device simultaneous speech-to-speech translation by extending the Translatotron lineage with a causal streaming encoder, wait-$k$ attention-based decoder, and a streaming vocoder. The architecture enables on-device translation with adjustable latency via a fixed delay and demonstrates execution on a Pixel 7 Pro, achieving strong BLEU performance and favorable latency compared to prior real-time S2ST methods on MuST-C, while also surpassing offline Translatotron baselines in several settings. Key findings include a BLEU improvement over Translatotron 1 in real-time Spanish–English, substantial BLEU gains in offline settings, and clear latency-accuracy trade-offs as the waiting parameter $k$ is varied. The work advances practical, privacy-preserving S2ST on mobile devices and lays groundwork for broader multilingual on-device translation with further vocoder and hardware optimizations.
Abstract
Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the strengths of the Translatotron framework while incorporating key modifications for streaming operation, and an adjustable fixed delay. Our experiments show that SimulTron surpasses Translatotron 2 in offline evaluations. Furthermore, real-time evaluations reveal that SimulTron improves upon the performance achieved by Translatotron 1. Additionally, SimulTron achieves superior BLEU scores and latency compared to previous real-time S2ST method on the MuST-C dataset. Significantly, we have successfully deployed SimulTron on a Pixel 7 Pro device, show its potential for simultaneous S2ST on-device.
