Table of Contents
Fetching ...

Continual Learning in Generative Adversarial Nets

Ari Seff, Alex Beatson, Daniel Suo, Han Liu

TL;DR

This work tackles continual learning for generative adversarial networks when data distributions evolve over time and old data are inaccessible. It adapts elastic weight consolidation to the GAN setting, applying a Fisher-information-based penalty to protect generator parameters critical for previously learned distributions, especially within conditional GANs. Empirical results on MNIST (MLP-GAN) and SVHN (DCGAN) show reduced forgetting of earlier classes while learning new ones, with robustness to the regularization strength and benefits from higher-capacity models. The approach enables scalable, sequential generative modeling without storing or re-synthesizing past data, advancing practical continual learning for generative models.

Abstract

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.

Continual Learning in Generative Adversarial Nets

TL;DR

This work tackles continual learning for generative adversarial networks when data distributions evolve over time and old data are inaccessible. It adapts elastic weight consolidation to the GAN setting, applying a Fisher-information-based penalty to protect generator parameters critical for previously learned distributions, especially within conditional GANs. Empirical results on MNIST (MLP-GAN) and SVHN (DCGAN) show reduced forgetting of earlier classes while learning new ones, with robustness to the regularization strength and benefits from higher-capacity models. The approach enables scalable, sequential generative modeling without storing or re-synthesizing past data, advancing practical continual learning for generative models.

Abstract

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: Conditionally sampled images of the digit 1 and digit 2 after training with the standard objective (top) and sampled images of the digits 1, 2, and 3 after training a new conditional input for digit 3 (bottom). When using the standard conditional GAN objective, the generator forgets how to sample from the previously learned distributions.
  • Figure 2: Conditionally sampled images of the digit 1 and digit 2 after training with the standard objective (top) and sampled images of the digits 1, 2, and 3 after training a new conditional input for digit 3 using the EWC-augmented objective (bottom). The generator no longer forgets how to produce samples from the previous categories.
  • Figure 3: Visualization of digit-specific Fisher information. A conditional GAN is trained on the digits $0$ through $5$ concurrently, and then the pixel-wise mean Fisher information for $G$'s output is computed per each conditional input.
  • Figure 4: Images sampled from a generator after training on different classes sequentially. Digits 0 and 1 were first trained concurrently, and the remaining digit classes were encountered one at a time sequentially. All samples were drawn after the last training session (after digit 6).
  • Figure 5: Sampling from $G$ at a fixed $z$ while training a new conditional input. Images of the digit 1 are sampled at the same fixed $z$ while training on the digit 3 with the standard objective (top right) and with the EWC-augmented objective (bottom right). Catastrophic forgetting is only visible with the standard objective. The Euclidean distance of the current sampled image from the original image is shown on the left under both training objectives.
  • ...and 1 more figures