A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
Xiaoqian Liu, Yangfan Du, Jianjin Wang, Yuan Ge, Chen Xu, Tong Xiao, Guocheng Chen, Jingbo Zhu
TL;DR
This work tackles gradient conflicts in multitask learning for simultaneous speech translation by introducing Modular Gradient Conflict Mitigation (MGCM). MGCM modularizes the model into components (LN, FFN, Attention) and detects conflicts per module using cosine similarity, projecting auxiliary gradients onto a plane orthogonal to the primary when conflicts exist, thereby avoiding the memory-heavy practice of concatenating all gradients. The method achieves substantial memory efficiency and improved translation quality, notably under medium and high latency, with offline BLEU gains around +0.68 (Greedy) and +0.63 (Beam5) in a DiSeg baseline, and memory savings exceeding 95% relative to other conflict-resolution approaches. Experiments on MuST-C En-De demonstrate that MGCM outperforms model-level approaches like PCGrad and simple discard strategies, while maintaining statistical significance (p < 0.05) and scalable memory requirements for larger models. Overall, MGCM provides a practical, scalable solution for real-time MT in SimulST by mitigating gradient conflicts at the modular level and delivering strong performance with reduced GPU memory consumption.
Abstract
Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges. Multi-task learning is often employed to enhance SimulST performance but introduces optimization conflicts between primary and auxiliary tasks, potentially compromising overall efficiency. The existing model-level conflict resolution methods are not well-suited for this task which exacerbates inefficiencies and leads to high GPU memory consumption. To address these challenges, we propose a Modular Gradient Conflict Mitigation (MGCM) strategy that detects conflicts at a finer-grained modular level and resolves them utilizing gradient projection. Experimental results demonstrate that MGCM significantly improves SimulST performance, particularly under medium and high latency conditions, achieving a 0.68 BLEU score gain in offline tasks. Additionally, MGCM reduces GPU memory consumption by over 95\% compared to other conflict mitigation methods, establishing it as a robust solution for SimulST tasks.
