Table of Contents
Fetching ...

Towards Robustness and Diversity: Continual Learning in Dialog Generation with Text-Mixup and Batch Nuclear-Norm Maximization

Zihan Wang, Jiayu Xiao, Mengxiang Li, Zhongjiang He, Yongxiang Li, Chao Wang, Shuangyong Song

TL;DR

This paper tackles catastrophic forgetting in continual learning for dialog generation by introducing Text-Mixup augmentation on replay memory and Batch Nuclear-Norm Maximization (BNNM) to promote representation diversity and reduce mode collapse. Using GPT-2 as the backbone, the approach interpolates hidden representations between current and replay samples and encourages high-rank batch representations, achieving improved performance on both task-oriented dialogue and chit-chat datasets. The method outperforms multiple continual learning baselines and is validated through comprehensive ablations, showing the value of data augmentation and representation regularization. The work advances practical continual learning for language generation, with potential impact on robust, multi-domain dialog systems in dynamic data environments.

Abstract

In our dynamic world where data arrives in a continuous stream, continual learning enables us to incrementally add new tasks/domains without the need to retrain from scratch. A major challenge in continual learning of language model is catastrophic forgetting, the tendency of models to forget knowledge from previously trained tasks/domains when training on new ones. This paper studies dialog generation under the continual learning setting. We propose a novel method that 1) uses \textit{Text-Mixup} as data augmentation to avoid model overfitting on replay memory and 2) leverages Batch-Nuclear Norm Maximization (BNNM) to alleviate the problem of mode collapse. Experiments on a $37$-domain task-oriented dialog dataset and DailyDialog (a $10$-domain chitchat dataset) demonstrate that our proposed approach outperforms the state-of-the-art in continual learning.

Towards Robustness and Diversity: Continual Learning in Dialog Generation with Text-Mixup and Batch Nuclear-Norm Maximization

TL;DR

This paper tackles catastrophic forgetting in continual learning for dialog generation by introducing Text-Mixup augmentation on replay memory and Batch Nuclear-Norm Maximization (BNNM) to promote representation diversity and reduce mode collapse. Using GPT-2 as the backbone, the approach interpolates hidden representations between current and replay samples and encourages high-rank batch representations, achieving improved performance on both task-oriented dialogue and chit-chat datasets. The method outperforms multiple continual learning baselines and is validated through comprehensive ablations, showing the value of data augmentation and representation regularization. The work advances practical continual learning for language generation, with potential impact on robust, multi-domain dialog systems in dynamic data environments.

Abstract

In our dynamic world where data arrives in a continuous stream, continual learning enables us to incrementally add new tasks/domains without the need to retrain from scratch. A major challenge in continual learning of language model is catastrophic forgetting, the tendency of models to forget knowledge from previously trained tasks/domains when training on new ones. This paper studies dialog generation under the continual learning setting. We propose a novel method that 1) uses \textit{Text-Mixup} as data augmentation to avoid model overfitting on replay memory and 2) leverages Batch-Nuclear Norm Maximization (BNNM) to alleviate the problem of mode collapse. Experiments on a -domain task-oriented dialog dataset and DailyDialog (a -domain chitchat dataset) demonstrate that our proposed approach outperforms the state-of-the-art in continual learning.
Paper Structure (16 sections, 9 equations, 2 figures, 10 tables)

This paper contains 16 sections, 9 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: The performance of different continual learning methods: BLEU scores on each task/domain according to its position in the training curriculum after training on all tasks/domains sequentially.
  • Figure 2: The rank of representation matrix with and without BNNM during the whole training process.