Table of Contents
Fetching ...

Self-generated Replay Memories for Continual Neural Machine Translation

Michele Resta, Davide Bacciu

TL;DR

This work tackles catastrophic forgetting in continual multilingual neural machine translation by proposing SG-Rep, a replay-based method that uses the model itself as a generator of synthetic parallel sentences. The approach maintains a fixed-size replay memory populated with self-generated pseudo-samples, which are filtered and translated to form training data for future experiences, thereby mitigating forgetting without explicit memorization of real past data. Across IWSLT17 and UNPC datasets, SG-Rep consistently outperforms traditional continual learning baselines and approaches the performance of joint training, demonstrating strong robustness to experience order and token diversity challenges. The method offers a practical pathway for continual, privacy-conscious multilingual NMT with manageable computational overhead and clear applicability to real-world multilingual deployment.

Abstract

Modern Neural Machine Translation systems exhibit strong performance in several different languages and are constantly improving. Their ability to learn continuously is, however, still severely limited by the catastrophic forgetting issue. In this work, we leverage a key property of encoder-decoder Transformers, i.e. their generative ability, to propose a novel approach to continually learning Neural Machine Translation systems. We show how this can effectively learn on a stream of experiences comprising different languages, by leveraging a replay memory populated by using the model itself as a generator of parallel sentences. We empirically demonstrate that our approach can counteract catastrophic forgetting without requiring explicit memorization of training data. Code will be publicly available upon publication. Code: https://github.com/m-resta/sg-rep

Self-generated Replay Memories for Continual Neural Machine Translation

TL;DR

This work tackles catastrophic forgetting in continual multilingual neural machine translation by proposing SG-Rep, a replay-based method that uses the model itself as a generator of synthetic parallel sentences. The approach maintains a fixed-size replay memory populated with self-generated pseudo-samples, which are filtered and translated to form training data for future experiences, thereby mitigating forgetting without explicit memorization of real past data. Across IWSLT17 and UNPC datasets, SG-Rep consistently outperforms traditional continual learning baselines and approaches the performance of joint training, demonstrating strong robustness to experience order and token diversity challenges. The method offers a practical pathway for continual, privacy-conscious multilingual NMT with manageable computational overhead and clear applicability to real-world multilingual deployment.

Abstract

Modern Neural Machine Translation systems exhibit strong performance in several different languages and are constantly improving. Their ability to learn continuously is, however, still severely limited by the catastrophic forgetting issue. In this work, we leverage a key property of encoder-decoder Transformers, i.e. their generative ability, to propose a novel approach to continually learning Neural Machine Translation systems. We show how this can effectively learn on a stream of experiences comprising different languages, by leveraging a replay memory populated by using the model itself as a generator of parallel sentences. We empirically demonstrate that our approach can counteract catastrophic forgetting without requiring explicit memorization of training data. Code will be publicly available upon publication. Code: https://github.com/m-resta/sg-rep
Paper Structure (30 sections, 1 equation, 6 figures, 17 tables, 1 algorithm)

This paper contains 30 sections, 1 equation, 6 figures, 17 tables, 1 algorithm.

Figures (6)

  • Figure 1: A scheme of the CILL setting. A model is trained incrementally on a stream of experiences comprising training data for various language pairs.
  • Figure 2: Forgetting curve of the different approaches. Average BLEU score on the first task evaluated at the end of the training process of each experience.
  • Figure 3: In light blue, frequencies of the top-200 sub-word tokens of exp. 1. The green bars represent the frequency of the same tokens that also appear in the 200 most frequent tokens of exp 2.
  • Figure 4: Effect of different memory sizes for A-GEM and SG-Rep.
  • Figure 5: Average BLEU score on all language directions at the end of each training experience.
  • ...and 1 more figures