Memory-efficient Continual Learning with Neural Collapse Contrastive
Trung-Anh Dang, Vincent Nguyen, Ngoc-Son Vu, Christel Vrain
TL;DR
This work addresses catastrophic forgetting in continual learning by balancing soft inter-sample relationships with hard sample-prototype relations. It introduces Focal Neural Collapse Contrastive (FNC^2) to drive plasticity by focusing on hard samples, and Hardness-Softness Distillation (HSD) to preserve knowledge across tasks, leveraging fixed Equiangular Tight Frame prototypes. Together, these NC-inspired losses reduce memory dependence and enable strong performance even in memory-free settings, while also performing well with limited memory buffers. The approach demonstrates superior or competitive results on Seq-CIFAR-10/100 and Seq-Tiny-ImageNet, highlighting practical impact for privacy-preserving, memory-constrained continual learning applications.
Abstract
Contrastive learning has significantly improved representation quality, enhancing knowledge transfer across tasks in continual learning (CL). However, catastrophic forgetting remains a key challenge, as contrastive based methods primarily focus on "soft relationships" or "softness" between samples, which shift with changing data distributions and lead to representation overlap across tasks. Recently, the newly identified Neural Collapse phenomenon has shown promise in CL by focusing on "hard relationships" or "hardness" between samples and fixed prototypes. However, this approach overlooks "softness", crucial for capturing intra-class variability, and this rigid focus can also pull old class representations toward current ones, increasing forgetting. Building on these insights, we propose Focal Neural Collapse Contrastive (FNC^2), a novel representation learning loss that effectively balances both soft and hard relationships. Additionally, we introduce the Hardness-Softness Distillation (HSD) loss to progressively preserve the knowledge gained from these relationships across tasks. Our method outperforms state-of-the-art approaches, particularly in minimizing memory reliance. Remarkably, even without the use of memory, our approach rivals rehearsal-based methods, offering a compelling solution for data privacy concerns.
