Table of Contents
Fetching ...

Information-Theoretic Dual Memory System for Continual Learning

RunQing Wu, KaiHui Huang, HanYi Zhang, QiHe Liu, GuoJin Yu, JingSong Deng, Fei Ye

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing the Information-Theoretic Dual Memory System (ITDMS), a plug-and-play architecture with a fast reservoir memory for recent data and a slow memory for informative long-term samples. ITDMS employs an information-theoretic memory optimization (ITMO) that uses the second-order Rényi entropy $H_2$ and Cauchy–Schwarz divergence $D_{CS}$ to select diverse, representative samples, complemented by a Balanced Sample Selection (BSS) to maintain category-balanced memory across task switches. The framework integrates with existing replay-based baselines (e.g., DER++) and demonstrates state-of-the-art performance across balanced and imbalanced data streams in Task-IL, Class-IL, and Domain-IL settings, with notable stability improvements at constrained memory sizes. The paper also provides ablations and analyses of sample-weight dynamics, highlighting the benefits of combining dual memory with information-theoretic selection and balanced removal, and discusses future directions for dynamic memory expansion and efficiency improvements.

Abstract

Continuously acquiring new knowledge from a dynamic environment is a fundamental capability for animals, facilitating their survival and ability to address various challenges. This capability is referred to as continual learning, which focuses on the ability to learn a sequence of tasks without the detriment of previous knowledge. A prevalent strategy to tackle continual learning involves selecting and storing numerous essential data samples from prior tasks within a fixed-size memory buffer. However, the majority of current memory-based techniques typically utilize a single memory buffer, which poses challenges in concurrently managing newly acquired and previously learned samples. Drawing inspiration from the Complementary Learning Systems (CLS) theory, which defines rapid and gradual learning mechanisms for processing information, we propose an innovative dual memory system called the Information-Theoretic Dual Memory System (ITDMS). This system comprises a fast memory buffer designed to retain temporary and novel samples, alongside a slow memory buffer dedicated to preserving critical and informative samples. The fast memory buffer is optimized employing an efficient reservoir sampling process. Furthermore, we introduce a novel information-theoretic memory optimization strategy that selectively identifies and retains diverse and informative data samples for the slow memory buffer. Additionally, we propose a novel balanced sample selection procedure that automatically identifies and eliminates redundant memorized samples, thus freeing up memory capacity for new data acquisitions, which can deal with a growing array of tasks. Our methodology is rigorously assessed through a series of continual learning experiments, with empirical results underscoring the effectiveness of the proposed system.

Information-Theoretic Dual Memory System for Continual Learning

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing the Information-Theoretic Dual Memory System (ITDMS), a plug-and-play architecture with a fast reservoir memory for recent data and a slow memory for informative long-term samples. ITDMS employs an information-theoretic memory optimization (ITMO) that uses the second-order Rényi entropy and Cauchy–Schwarz divergence to select diverse, representative samples, complemented by a Balanced Sample Selection (BSS) to maintain category-balanced memory across task switches. The framework integrates with existing replay-based baselines (e.g., DER++) and demonstrates state-of-the-art performance across balanced and imbalanced data streams in Task-IL, Class-IL, and Domain-IL settings, with notable stability improvements at constrained memory sizes. The paper also provides ablations and analyses of sample-weight dynamics, highlighting the benefits of combining dual memory with information-theoretic selection and balanced removal, and discusses future directions for dynamic memory expansion and efficiency improvements.

Abstract

Continuously acquiring new knowledge from a dynamic environment is a fundamental capability for animals, facilitating their survival and ability to address various challenges. This capability is referred to as continual learning, which focuses on the ability to learn a sequence of tasks without the detriment of previous knowledge. A prevalent strategy to tackle continual learning involves selecting and storing numerous essential data samples from prior tasks within a fixed-size memory buffer. However, the majority of current memory-based techniques typically utilize a single memory buffer, which poses challenges in concurrently managing newly acquired and previously learned samples. Drawing inspiration from the Complementary Learning Systems (CLS) theory, which defines rapid and gradual learning mechanisms for processing information, we propose an innovative dual memory system called the Information-Theoretic Dual Memory System (ITDMS). This system comprises a fast memory buffer designed to retain temporary and novel samples, alongside a slow memory buffer dedicated to preserving critical and informative samples. The fast memory buffer is optimized employing an efficient reservoir sampling process. Furthermore, we introduce a novel information-theoretic memory optimization strategy that selectively identifies and retains diverse and informative data samples for the slow memory buffer. Additionally, we propose a novel balanced sample selection procedure that automatically identifies and eliminates redundant memorized samples, thus freeing up memory capacity for new data acquisitions, which can deal with a growing array of tasks. Our methodology is rigorously assessed through a series of continual learning experiments, with empirical results underscoring the effectiveness of the proposed system.
Paper Structure (21 sections, 19 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 19 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: The training procedure for the proposed dual memory system consisting of a fast and slow memory buffer, respectively. The fast memory buffer continually stores new data samples or replaces old memorized samples with new ones via reservoir sampling. Assuming that the model was trained at the $i$-th task learning ($i>1$), we first remove samples from the slow memory buffer using Eq. \ref{['sampleRemove']}. Then we optimize the sample selection probability using Eq. \ref{['weightOptimization2']} and add the data samples from $D^s_i$ according to the selection weight vector ${\bf w}$.
  • Figure 2: The optimization process of the proposed balanced sample selection approach. The first step is to define the central sample for each category for the slow memory buffer. The second step is to estimate the diversity score for each memorized sample from the slow memory buffer. Finally, we maintained a balanced label distribution for the samples in the slow memory while removing duplicate or similar sample, which have small average distances, thereby maximizing the diversity of the stored samples.
  • Figure 3: Performance comparison between the proposed ITDMS framework and baselines in imbalanced data stream scenarios. All results are averaged over 10 runs. (a), (b), and (c) : The results of various models on the Split-MNIST dataset using the memory sizes of 200, 500, and 1000, respectively. (d), (e), and (f) : The results of various models on the Split-CIFAR10 dataset using the memory sizes of 200, 500, and 1000, respectively.
  • Figure 4: The forgetting analysis of various models. (a), (b), (c) report the results on the Split-MNIST, Split-CIFAR10 and R-MNIST, respectively. Each subfigure corresponding to a dataset is set with three different storage parameters: 200, 500, and 1000, from left to right.
  • Figure 5: The distribution changes of the sample selection weights during the optimization process. The horizontal and vertical coordinates represent the value distribution of sample weights and their corresponding quantity proportions, respectively: (a) The results when all weights are initialized to 0.1, indicating that each sample is assigned by the same sample selection weight; (b-g) The distribution changes of the sample selection weights, which goes through a global search and weight sparsification phase, indicating that the model is evaluating the importance of selected samples, eventually converging, as shown in (h), with the majority of data points close to 0 and a few data points close to 1.
  • ...and 4 more figures