Table of Contents
Fetching ...

Rethinking Continual Learning with Progressive Neural Collapse

Zheng Wang, Wanhao Yu, Li Yang, Sen Lin

TL;DR

ProNC is proposed, a novel framework that completely removes the need of a fixed global ETF in CL and significantly outperforms related baselines while maintaining superior flexibility, simplicity, and efficiency.

Abstract

Continual Learning (CL) seeks to build an agent that can continuously learn a sequence of tasks, where a key challenge, namely Catastrophic Forgetting, persists due to the potential knowledge interference among different tasks. On the other hand, deep neural networks (DNNs) are shown to converge to a terminal state termed Neural Collapse during training, where all class prototypes geometrically form a static simplex equiangular tight frame (ETF). These maximally and equally separated class prototypes make the ETF an ideal target for model learning in CL to mitigate knowledge interference. Thus inspired, several studies have emerged very recently to leverage a fixed global ETF in CL, which however suffers from key drawbacks, such as impracticability and limited performance.To address these challenges and fully unlock the potential of ETF in CL, we propose Progressive Neural Collapse (ProNC), a novel framework that completely removes the need of a fixed global ETF in CL. Specifically, ProNC progressively expands the ETF target in a principled way by adding new class prototypes as vertices for new tasks, ensuring maximal separability across all encountered classes with minimal shifts from the previous ETF. We next develop a new CL framework by plugging ProNC into commonly used CL algorithm designs, where distillation is further leveraged to balance between target shifting for old classes and target aligning for new classes. Extensive experiments show that our approach significantly outperforms related baselines while maintaining superior flexibility, simplicity, and efficiency. Our code is available at https://github.com/Continue-Edge-AI-Lab/ProNC

Rethinking Continual Learning with Progressive Neural Collapse

TL;DR

ProNC is proposed, a novel framework that completely removes the need of a fixed global ETF in CL and significantly outperforms related baselines while maintaining superior flexibility, simplicity, and efficiency.

Abstract

Continual Learning (CL) seeks to build an agent that can continuously learn a sequence of tasks, where a key challenge, namely Catastrophic Forgetting, persists due to the potential knowledge interference among different tasks. On the other hand, deep neural networks (DNNs) are shown to converge to a terminal state termed Neural Collapse during training, where all class prototypes geometrically form a static simplex equiangular tight frame (ETF). These maximally and equally separated class prototypes make the ETF an ideal target for model learning in CL to mitigate knowledge interference. Thus inspired, several studies have emerged very recently to leverage a fixed global ETF in CL, which however suffers from key drawbacks, such as impracticability and limited performance.To address these challenges and fully unlock the potential of ETF in CL, we propose Progressive Neural Collapse (ProNC), a novel framework that completely removes the need of a fixed global ETF in CL. Specifically, ProNC progressively expands the ETF target in a principled way by adding new class prototypes as vertices for new tasks, ensuring maximal separability across all encountered classes with minimal shifts from the previous ETF. We next develop a new CL framework by plugging ProNC into commonly used CL algorithm designs, where distillation is further leveraged to balance between target shifting for old classes and target aligning for new classes. Extensive experiments show that our approach significantly outperforms related baselines while maintaining superior flexibility, simplicity, and efficiency. Our code is available at https://github.com/Continue-Edge-AI-Lab/ProNC

Paper Structure

This paper contains 20 sections, 2 theorems, 11 equations, 3 figures, 11 tables.

Key Result

Theorem 1

Let $\mathbf{U'}=\sqrt{\frac{K_1-1}{K_1}}\tilde{\bm{M}}_{K_1}\left(\mathbf{I}_{K_1} - \frac{1}{K_1} \mathbf{1}_{K_1} \mathbf{1}_{K_1}^\top \right)$ and the SVD of $\mathbf{U'}$ is $\mathbf{W\Sigma V}^\top$. Then the ETF matrix $\mathbf{E}^*$ can be obtained as follows:

Figures (3)

  • Figure 1: Accuracy under different sizes of predefined ETF in NCT
  • Figure 2: Overflow of our proposed CL framework for new task learning based on the mixture of current task data and replay data. The new model $f_t$ is trained towards the expanded ETF target, with forgetting further reduced based on feature distillation.
  • Figure 3: In (a)-(e), X-axis is the task ID during CL and Y-axis is the (average or std) value of the corresponding cosine similarity. In (f), X-axis is the accuracy FAA and Y-axis is the value of $(\lambda_1, \lambda_2)$.

Theorems & Definitions (3)

  • Definition 1: Simplex Equiangular Tight Frame
  • Theorem 1
  • Lemma 1: Nearest Orthogonal Matrix via SVD