Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

Shaoxu Cheng; Kanglei Geng; Chiyuan He; Zihuan Qiu; Linfeng Xu; Heqian Qiu; Lanxiao Wang; Qingbo Wu; Fanman Meng; Hongliang Li

Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

Shaoxu Cheng, Kanglei Geng, Chiyuan He, Zihuan Qiu, Linfeng Xu, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Hongliang Li

TL;DR

This work tackles catastrophic forgetting in continual learning by preserving the distribution of old knowledge at the feature level rather than relying solely on class centers. It introduces Distribution-level Memory Recall (DMR), which uses a Gaussian Mixture Model to fit old feature distributions and generate faithful pseudo features for the next incremental stage, with adaptive Gaussian component selection and covariance degradation to manage storage. To mitigate interference between old and new knowledge, the Incremental Mixup Feature Enhancement (IMFE) blends new-class features with old-prototype information, while Inter-Modal Guidance and Intra-Modal Mining (IGIM) addresses multimodal imbalance by guiding weaker modalities with dominant ones and mining within modalities. Extensive experiments on CIFAR100, ImageNet100, and UESTC-MMEA-CL demonstrate state-of-the-art performance and robust ablations validate the contributions, highlighting the practical impact for scalable and private exemplar-free continual learning in multimodal settings.

Abstract

Continual Learning (CL) aims to enable Deep Neural Networks (DNNs) to learn new data without forgetting previously learned knowledge. The key to achieving this goal is to avoid confusion at the feature level, i.e., avoiding confusion within old tasks and between new and old tasks. Previous prototype-based CL methods generate pseudo features for old knowledge replay by adding Gaussian noise to the centroids of old classes. However, the distribution in the feature space exhibits anisotropy during the incremental process, which prevents the pseudo features from faithfully reproducing the distribution of old knowledge in the feature space, leading to confusion in classification boundaries within old tasks. To address this issue, we propose the Distribution-Level Memory Recall (DMR) method, which uses a Gaussian mixture model to precisely fit the feature distribution of old knowledge at the distribution level and generate pseudo features in the next stage. Furthermore, resistance to confusion at the distribution level is also crucial for multimodal learning, as the problem of multimodal imbalance results in significant differences in feature responses between different modalities, exacerbating confusion within old tasks in prototype-based CL methods. Therefore, we mitigate the multi-modal imbalance problem by using the Inter-modal Guidance and Intra-modal Mining (IGIM) method to guide weaker modalities with prior information from dominant modalities and further explore useful information within modalities. For the second key, We propose the Confusion Index to quantitatively describe a model's ability to distinguish between new and old tasks, and we use the Incremental Mixup Feature Enhancement (IMFE) method to enhance pseudo features with new sample features, alleviating classification confusion between new and old knowledge.

Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

TL;DR

Abstract

Paper Structure (17 sections, 15 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 15 equations, 12 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Continual Learning
Multimodal Continual Learning
Methodology
Problem Statement
Distribution-level Memory Recall
Avoiding Old-New Confusion
Generalization Is Prerequisite.
Experiments
Datasets
Implementation Details
Analysis on DMR and DMR-L Method
Analysis on IMFE Method
Comparison with Benchmarks
...and 2 more sections

Figures (12)

Figure 1: The distribution of features after dimensionality reduction using t-SNE van2008visualizing, where (a) represents the actual embedding space distribution of samples from four classes; (b) represents the pseudo features generated during the incremental stage using the class centers and deviations stored in (a); and (c) represents the pseudo features generated during the incremental stage using our method.
Figure 2: Illustration of the motivation behind the distribution-level memory recall method. We aim to avoid significant information loss and inter-class confusion when old knowledge is reproduced. Instead, we prefer to maintain its original distribution when recalling memories.
Figure 3: The relationship between confusion level and plasticity and stability in tasks old and new. By controlling the coefficient of distillation loss, the contrasts in (a) and (b) in the figure are both severely confused; pass better alleviates the dilemma of plasticity and stability, but there is still confusion between new and old classes; our method achieves better results by avoiding confusion between new and old.
Figure 4: Unlike the two losses in previous prototype-based methods, we enhance the pseudo features using samples from the new task, increasing the discriminative ability between new and old tasks.
Figure 5: Detail architecture of IGIM. It aims to alleviate the imbalance of sensor modalities caused by the optimization deficiency of weak sensor modalities through enhancing information on the time and frequency dimensions.
...and 7 more figures

Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

TL;DR

Abstract

Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

Authors

TL;DR

Abstract

Table of Contents

Figures (12)