Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation
DongNyeong Heo, Heeyoul Choi
TL;DR
This work tackles the quality-speed trade-off in non-autoregressive NMT by introducing LadderNMT, a dual hierarchical latent variable model with a shared intermediate latent space across languages. By employing ladder inference to estimate a posterior over a shared Z without a separate posterior network, the method reduces parameters and mitigates one-sided posterior collapse, while fostering language-agnostic representations. Empirical results on WMT tasks show superior or comparable BLEU scores with substantially fewer parameters, and qualitative analyses (2-D visualizations and CCA) demonstrate that LadderNMT learns more aligned, language-agnostic latent spaces. The approach also yields robust performance gains when applied to both LaNMT and FullyNAT architectures, suggesting strong practical impact for fast and accurate NAT systems and potential extensions to multilingual and cross-modal tasks.
Abstract
Non-autoregressive neural machine translation (NAT) offers substantial translation speed up compared to autoregressive neural machine translation (AT) at the cost of translation quality. Latent variable modeling has emerged as a promising approach to bridge this quality gap, particularly for addressing the chronic multimodality problem in NAT. In the previous works that used latent variable modeling, they added an auxiliary model to estimate the posterior distribution of the latent variable conditioned on the source and target sentences. However, it causes several disadvantages, such as redundant information extraction in the latent variable, increasing the number of parameters, and a tendency to ignore some information from the inputs. In this paper, we propose a novel latent variable modeling that integrates a dual reconstruction perspective and an advanced hierarchical latent modeling with a shared intermediate latent space across languages. This latent variable modeling hypothetically alleviates or prevents the above disadvantages. In our experiment results, we present comprehensive demonstrations that our proposed approach infers superior latent variables which lead better translation quality. Finally, in the benchmark translation tasks, such as WMT, we demonstrate that our proposed method significantly improves translation quality compared to previous NAT baselines including the state-of-the-art NAT model.
