Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

Yusheng Liao; Yanfeng Wang; Yu Wang

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

Yusheng Liao, Yanfeng Wang, Yu Wang

TL;DR

This work tackles the trade-off between translation quality and decoding speed by enabling collaboration between autoregressive and non-autoregressive neural machine translation models. It introduces Diverse Context Modeling with Collaborative Learning (DCMCL), which combines token-level mutual learning and sequence-level contrastive learning within a shared-encoder architecture, plus a hybrid teacher variant to stabilize training. Empirically, DCMCL yields substantial BLEU improvements on four major MT benchmarks, outperforming state-of-the-art unified models and prior mutual-learning baselines, while reducing NAR multi-modality and enhancing AR semantic coherence. The framework is validated through extensive ablations and analyses, demonstrating the importance of encoder sharing, token-level interactions, and sequence-level alignment for cross-model context exploitation and robust translation performance.

Abstract

Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT). AR models predict tokens in a word-by-word manner and can effectively capture the distribution of real translations. NAR models predict tokens by extracting bidirectional contextual information which can improve the inference speed but they suffer from performance degradation. Previous works utilized AR models to enhance NAR models by reducing the training data's complexity or incorporating the global information into AR models by virtue of NAR models. However, those investigated methods only take advantage of the contextual information of a single type of model while neglecting the diversity in the contextual information that can be provided by different types of models. In this paper, we propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students. To hierarchically leverage the bilateral contextual information, token-level mutual learning and sequence-level contrastive learning are adopted between AR and NAR models. Extensive experiments on four widely used benchmarks show that the proposed DCMCL method can simultaneously improve both AR and NAR models with up to 1.38 and 2.98 BLEU scores respectively, and can also outperform the current best-unified model with up to 0.97 BLEU scores for both AR and NAR decoding.

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

TL;DR

Abstract

Paper Structure (36 sections, 23 equations, 6 figures, 11 tables)

This paper contains 36 sections, 23 equations, 6 figures, 11 tables.

Introduction
Preliminary
Machine Translation
Mutual Learning on Sequence Models
Contrastive Learning on Sequence Models
Method
Model Structure
Shared Encoder
Decoders
DCMCL Training Framework
Multi-task Learning
Token-level Mutual Learning
Sequence-level Contrastive Learning
Learning with Hybrid Teacher
Experiment
...and 21 more sections

Figures (6)

Figure 1: Overview of the proposed method. The model structures can be divided into three parts, including the shared encoder, AR decoder, and NAR decoder. The Token-level Mutual Learning is only adopted on the masked input token $[M]$. In sequence-level contrastive learning, solid lines connect positive pairs, and dotted lines connect negative pairs sampled from a batch.
Figure 2: Performance under different mask ratios of training process on IWSLT14 DE-EN validation data.
Figure 3: The averaged token-level similarity between the hidden states of AR and NAR decoders on IWSLT14 DE-EN training corpus. The dark case adds sequence-level contrastive learning based on the light case.
Figure 4: Performance under different target sentence length on IWSLT14 EN-DE test data.
Figure 5: Performance under different target sentence length on WMT14 EN-DE test data.
...and 1 more figures

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

TL;DR

Abstract

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)