Table of Contents
Fetching ...

Generative replay with feedback connections as a general strategy for continual learning

Gido M. van de Ven, Andreas S. Tolias

TL;DR

The paper addresses catastrophic forgetting in continual learning by formalizing three evaluation scenarios based on task identity availability, showing generative replay with distillation outperforms regularization across all scenarios. It introduces Replay-through-Feedback (RtF), which embeds a generative model into the main network via feedback connections to reduce training cost with minimal performance loss. Across split MNIST and permuted MNIST, RtF achieves substantial time savings while preserving accuracy, and the results solidify generative replay with distillation as a robust, general continual-learning strategy. The work also discusses limitations and future directions for scaling to more complex inputs and tasks, as well as privacy considerations when data storage is restricted.

Abstract

A major obstacle to developing artificial intelligence applications capable of true lifelong learning is that artificial neural networks quickly or catastrophically forget previously learned tasks when trained on a new one. Numerous methods for alleviating catastrophic forgetting are currently being proposed, but differences in evaluation protocols make it difficult to directly compare their performance. To enable more meaningful comparisons, here we identified three distinct scenarios for continual learning based on whether task identity is known and, if it is not, whether it needs to be inferred. Performing the split and permuted MNIST task protocols according to each of these scenarios, we found that regularization-based approaches (e.g., elastic weight consolidation) failed when task identity needed to be inferred. In contrast, generative replay combined with distillation (i.e., using class probabilities as "soft targets") achieved superior performance in all three scenarios. Addressing the issue of efficiency, we reduced the computational cost of generative replay by integrating the generative model into the main model by equipping it with generative feedback or backward connections. This Replay-through-Feedback approach substantially shortened training time with no or negligible loss in performance. We believe this to be an important first step towards making the powerful technique of generative replay scalable to real-world continual learning applications.

Generative replay with feedback connections as a general strategy for continual learning

TL;DR

The paper addresses catastrophic forgetting in continual learning by formalizing three evaluation scenarios based on task identity availability, showing generative replay with distillation outperforms regularization across all scenarios. It introduces Replay-through-Feedback (RtF), which embeds a generative model into the main network via feedback connections to reduce training cost with minimal performance loss. Across split MNIST and permuted MNIST, RtF achieves substantial time savings while preserving accuracy, and the results solidify generative replay with distillation as a robust, general continual-learning strategy. The work also discusses limitations and future directions for scaling to more complex inputs and tasks, as well as privacy considerations when data storage is restricted.

Abstract

A major obstacle to developing artificial intelligence applications capable of true lifelong learning is that artificial neural networks quickly or catastrophically forget previously learned tasks when trained on a new one. Numerous methods for alleviating catastrophic forgetting are currently being proposed, but differences in evaluation protocols make it difficult to directly compare their performance. To enable more meaningful comparisons, here we identified three distinct scenarios for continual learning based on whether task identity is known and, if it is not, whether it needs to be inferred. Performing the split and permuted MNIST task protocols according to each of these scenarios, we found that regularization-based approaches (e.g., elastic weight consolidation) failed when task identity needed to be inferred. In contrast, generative replay combined with distillation (i.e., using class probabilities as "soft targets") achieved superior performance in all three scenarios. Addressing the issue of efficiency, we reduced the computational cost of generative replay by integrating the generative model into the main model by equipping it with generative feedback or backward connections. This Replay-through-Feedback approach substantially shortened training time with no or negligible loss in performance. We believe this to be an important first step towards making the powerful technique of generative replay scalable to real-world continual learning applications.

Paper Structure

This paper contains 24 sections, 13 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Schematic of the split MNIST task protocol.
  • Figure 2: Schematic of the permuted MNIST task protocol.
  • Figure 3: RtF schematic.
  • Figure 4: Average test accuracy (over all tasks) on the split MNIST protocol plotted against training time. Each experiment was run 20 times: dots represent individual runs, stars indicate the mean.
  • Figure 5: Idem as Figure \ref{['fig:timeSplit']}, except on the permuted MNIST task protocol.
  • ...and 3 more figures