Table of Contents
Fetching ...

Deep Generative Dual Memory Network for Continual Learning

Nitin Kamra, Umang Gupta, Yan Liu

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing a deep generative dual memory network (DGDMN) that mimics hippocampal-neocortical memory through a short-term memory (STM) and a long-term memory (LTM). A core component, Deep Generative Replay (DGR), uses a variational autoencoder-based generator to replay past experiences and consolidate knowledge, while DGDMN adds small STTMs for rapid task acquisition and a consolidation sleep phase to update the LTM. Empirical results across several sequential-task benchmarks show that DGDMN and its DGR variant outperform baselines in average accuracy and exhibit substantially reduced forgetting, with DGDMN offering faster training and better long-term retention on long task sequences. The work also connects the architecture to neuroscience-inspired concepts like sleep and complementary learning systems, providing both practical continual-learning gains and insights into memory-inspired learning dynamics.

Abstract

Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks. This phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our contributions are: (i) a dual memory architecture emulating the complementary learning systems (hippocampus and the neocortex) in the human brain, (ii) memory consolidation via generative replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models. Our architecture displays many characteristics of the mammalian memory and provides insights on the connection between sleep and learning.

Deep Generative Dual Memory Network for Continual Learning

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing a deep generative dual memory network (DGDMN) that mimics hippocampal-neocortical memory through a short-term memory (STM) and a long-term memory (LTM). A core component, Deep Generative Replay (DGR), uses a variational autoencoder-based generator to replay past experiences and consolidate knowledge, while DGDMN adds small STTMs for rapid task acquisition and a consolidation sleep phase to update the LTM. Empirical results across several sequential-task benchmarks show that DGDMN and its DGR variant outperform baselines in average accuracy and exhibit substantially reduced forgetting, with DGDMN offering faster training and better long-term retention on long task sequences. The work also connects the architecture to neuroscience-inspired concepts like sleep and complementary learning systems, providing both practical continual-learning gains and insights into memory-inspired learning dynamics.

Abstract

Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks. This phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our contributions are: (i) a dual memory architecture emulating the complementary learning systems (hippocampus and the neocortex) in the human brain, (ii) memory consolidation via generative replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models. Our architecture displays many characteristics of the mammalian memory and provides insights on the connection between sleep and learning.

Paper Structure

This paper contains 22 sections, 2 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: Deep Generative Replay to train a Deep Generative Memory
  • Figure 2: Deep Generative Dual Memory Network (DGDMN)
  • Figure 3: Accuracy curves for Permnist (x: tasks seen, y: classification accuracy on task).
  • Figure 4: Accuracy curves for Digits (x: tasks seen, y: classification accuracy on task).
  • Figure 5: Forgetting curves (x: tasks seen, y: avg classification accuracy on tasks seen).
  • ...and 9 more figures