Table of Contents
Fetching ...

Memory-efficient Continual Learning with Neural Collapse Contrastive

Trung-Anh Dang, Vincent Nguyen, Ngoc-Son Vu, Christel Vrain

TL;DR

This work addresses catastrophic forgetting in continual learning by balancing soft inter-sample relationships with hard sample-prototype relations. It introduces Focal Neural Collapse Contrastive (FNC^2) to drive plasticity by focusing on hard samples, and Hardness-Softness Distillation (HSD) to preserve knowledge across tasks, leveraging fixed Equiangular Tight Frame prototypes. Together, these NC-inspired losses reduce memory dependence and enable strong performance even in memory-free settings, while also performing well with limited memory buffers. The approach demonstrates superior or competitive results on Seq-CIFAR-10/100 and Seq-Tiny-ImageNet, highlighting practical impact for privacy-preserving, memory-constrained continual learning applications.

Abstract

Contrastive learning has significantly improved representation quality, enhancing knowledge transfer across tasks in continual learning (CL). However, catastrophic forgetting remains a key challenge, as contrastive based methods primarily focus on "soft relationships" or "softness" between samples, which shift with changing data distributions and lead to representation overlap across tasks. Recently, the newly identified Neural Collapse phenomenon has shown promise in CL by focusing on "hard relationships" or "hardness" between samples and fixed prototypes. However, this approach overlooks "softness", crucial for capturing intra-class variability, and this rigid focus can also pull old class representations toward current ones, increasing forgetting. Building on these insights, we propose Focal Neural Collapse Contrastive (FNC^2), a novel representation learning loss that effectively balances both soft and hard relationships. Additionally, we introduce the Hardness-Softness Distillation (HSD) loss to progressively preserve the knowledge gained from these relationships across tasks. Our method outperforms state-of-the-art approaches, particularly in minimizing memory reliance. Remarkably, even without the use of memory, our approach rivals rehearsal-based methods, offering a compelling solution for data privacy concerns.

Memory-efficient Continual Learning with Neural Collapse Contrastive

TL;DR

This work addresses catastrophic forgetting in continual learning by balancing soft inter-sample relationships with hard sample-prototype relations. It introduces Focal Neural Collapse Contrastive (FNC^2) to drive plasticity by focusing on hard samples, and Hardness-Softness Distillation (HSD) to preserve knowledge across tasks, leveraging fixed Equiangular Tight Frame prototypes. Together, these NC-inspired losses reduce memory dependence and enable strong performance even in memory-free settings, while also performing well with limited memory buffers. The approach demonstrates superior or competitive results on Seq-CIFAR-10/100 and Seq-Tiny-ImageNet, highlighting practical impact for privacy-preserving, memory-constrained continual learning applications.

Abstract

Contrastive learning has significantly improved representation quality, enhancing knowledge transfer across tasks in continual learning (CL). However, catastrophic forgetting remains a key challenge, as contrastive based methods primarily focus on "soft relationships" or "softness" between samples, which shift with changing data distributions and lead to representation overlap across tasks. Recently, the newly identified Neural Collapse phenomenon has shown promise in CL by focusing on "hard relationships" or "hardness" between samples and fixed prototypes. However, this approach overlooks "softness", crucial for capturing intra-class variability, and this rigid focus can also pull old class representations toward current ones, increasing forgetting. Building on these insights, we propose Focal Neural Collapse Contrastive (FNC^2), a novel representation learning loss that effectively balances both soft and hard relationships. Additionally, we introduce the Hardness-Softness Distillation (HSD) loss to progressively preserve the knowledge gained from these relationships across tasks. Our method outperforms state-of-the-art approaches, particularly in minimizing memory reliance. Remarkably, even without the use of memory, our approach rivals rehearsal-based methods, offering a compelling solution for data privacy concerns.

Paper Structure

This paper contains 22 sections, 12 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Our method mitigates the drawbacks of both "soft" and "hard" learning by using fixed, equidistant prototypes to minimize cluster overlap. (a) Overlap between new and old classes happens due to non-fixed class clusters and representation drift. (b) Strong alignment with prototypes can overly cluster current task representations and pull older ones into new clusters, especially in CL with few old samples. Moreover, mixed-feature samples should be placed between classes, not tightly aligned with prototypes. (c) Our method considers both inter-sample and sample-prototype relationships, maintaining distinct cluster representations and preserving distribution within each cluster.
  • Figure 2: Shifts in representations of current samples at task $t$ between the beginning of training and after several epochs.
  • Figure 3: Overall architecture of our method. Augmented samples from each batch are fed into the current model $f^t$ to learn new knowledge via $\mathcal{L}_{FNC^2}$ and the frozen previous model $f^{t-1}$ for distillation using $\mathcal{L}_{HSD}$. The buffer is optional, and NC-based prototypes are directly involved in both loss functions during training.
  • Figure 4: Test accuracy over different values of $\gamma$.