Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
Sai Akarsh, Vamshi Raghusimha, Anindita Mondal, Anil Vuppala
TL;DR
This work addresses the lack of natural intonation in speech-to-speech machine translation by developing a stress-annotated Indian English dataset and a TTS system capable of injecting stress into Hindi speech. The authors integrate a stress-detection module with a FastPitch-based TTS that uses a PDE Modifier to adjust pitch, duration, and energy for stressed words, enabling stress transfer in an English→Hindi SSMT pipeline. Key contributions include a publicly annotated Indian English stress dataset, a robust stress-detection approach using frame- and word-level cues, and a PDE-driven TTS mechanism that facilitates prosody-aware translation. Subjective evaluations suggest stress transfer is feasible (st-MOS ≈ 3.96) though overall MOS benefits from controlled stress modulation, underscoring the potential to enhance engagement in educational content across Indian languages.
Abstract
The language diversity in India's education sector poses a significant challenge, hindering inclusivity. Despite the democratization of knowledge through online educational content, the dominance of English, as the internet's lingua franca, limits accessibility, emphasizing the crucial need for translation into Indian languages. Despite existing Speech-to-Speech Machine Translation (SSMT) technologies, the lack of intonation in these systems gives monotonous translations, leading to a loss of audience interest and disengagement from the content. To address this, our paper introduces a dataset with stress annotations in Indian English and also a Text-to-Speech (TTS) architecture capable of incorporating stress into synthesized speech. This dataset is used for training a stress detection model, which is then used in the SSMT system for detecting stress in the source speech and transferring it into the target language speech. The TTS architecture is based on FastPitch and can modify the variances based on stressed words given. We present an Indian English-to-Hindi SSMT system that can transfer stress and aim to enhance the overall quality and engagement of educational content.
