Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Yiwen Guan; Jacob Whitehill

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Yiwen Guan, Jacob Whitehill

Abstract

Multilingual translation suffers from computational redundancy, especially when translating into multiple languages simultaneously. In addition, translation quality can suffer for low-resource languages. To address this, we introduce Transformer Encoder Tree (TET), a hierarchical, non-autoregressive encoder-only architecture trained with Connectionist Temporal Classification (CTC) for multilingual translation. TET shares intermediate representations among linguistically similar target languages, improving accuracy on low-resource languages while reducing computational redundancy and enabling the generation of all target languages in a single forward pass. TET eliminates the sequential bottleneck of autoregressive models and supports fully parallel decoding of all tokens across all target languages. Compared to a naive one-to-many multilingual design, TET reduces the total parameter count by 66% and lowers inference computation by 60%. In speech translation, combining TET with a non-autoregressive speech recognition backbone (Wav2Vec2) shows competitive translation quality compared to autoregressive systems while speeding up inference by approximately 7-14 times.

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Abstract

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Abstract

Paper Structure

Table of Contents

Figures (3)