Table of Contents
Fetching ...

Enhancing Pre-Trained Model-Based Class-Incremental Learning through Neural Collapse

Kun He, Zijian Song, Shuoxi Zhang, John E. Hopcroft

TL;DR

This work addresses how features should evolve in pre-trained model–based class-incremental learning by invoking neural collapse as a guiding geometry. It introduces NCPTM-CIL, which enforces a dynamic, equiangular prototype structure through a Dynamic ETF classifier, ETF alignment, and PAP loss to preserve inter-class separation as new classes arrive. Empirical results on CIFAR-100, CUB-200, VTAB, and OmniBenchmark show state-of-the-art performance and a small gap to the joint-learning upper bound, illustrating strong resistance to catastrophic forgetting. The proposed geometry-aware framework offers a principled approach to leveraging pre-trained representations for scalable, continual learning.

Abstract

Class-Incremental Learning (CIL) is a critical capability for real-world applications, enabling learning systems to adapt to new tasks while retaining knowledge from previous ones. Recent advancements in pre-trained models (PTMs) have significantly advanced the field of CIL, demonstrating superior performance over traditional methods. However, understanding how features evolve and are distributed across incremental tasks remains an open challenge. In this paper, we propose a novel approach to modeling feature evolution in PTM-based CIL through the lens of neural collapse (NC), a striking phenomenon observed in the final phase of training, which leads to a well-separated, equiangular feature space. We explore the connection between NC and CIL effectiveness, showing that aligning feature distributions with the NC geometry enhances the ability to capture the dynamic behavior of continual learning. Based on this insight, we introduce Neural Collapse-inspired Pre-Trained Model-based CIL (NCPTM-CIL), a method that dynamically adjusts the feature space to conform to the elegant NC structure, thereby enhancing the continual learning process. Extensive experiments demonstrate that NCPTM-CIL outperforms state-of-the-art methods across four benchmark datasets. Notably, when initialized with ViT-B/16-IN1K, NCPTM-CIL surpasses the runner-up method by 6.73% on VTAB, 1.25% on CIFAR-100, and 2.5% on OmniBenchmark.

Enhancing Pre-Trained Model-Based Class-Incremental Learning through Neural Collapse

TL;DR

This work addresses how features should evolve in pre-trained model–based class-incremental learning by invoking neural collapse as a guiding geometry. It introduces NCPTM-CIL, which enforces a dynamic, equiangular prototype structure through a Dynamic ETF classifier, ETF alignment, and PAP loss to preserve inter-class separation as new classes arrive. Empirical results on CIFAR-100, CUB-200, VTAB, and OmniBenchmark show state-of-the-art performance and a small gap to the joint-learning upper bound, illustrating strong resistance to catastrophic forgetting. The proposed geometry-aware framework offers a principled approach to leveraging pre-trained representations for scalable, continual learning.

Abstract

Class-Incremental Learning (CIL) is a critical capability for real-world applications, enabling learning systems to adapt to new tasks while retaining knowledge from previous ones. Recent advancements in pre-trained models (PTMs) have significantly advanced the field of CIL, demonstrating superior performance over traditional methods. However, understanding how features evolve and are distributed across incremental tasks remains an open challenge. In this paper, we propose a novel approach to modeling feature evolution in PTM-based CIL through the lens of neural collapse (NC), a striking phenomenon observed in the final phase of training, which leads to a well-separated, equiangular feature space. We explore the connection between NC and CIL effectiveness, showing that aligning feature distributions with the NC geometry enhances the ability to capture the dynamic behavior of continual learning. Based on this insight, we introduce Neural Collapse-inspired Pre-Trained Model-based CIL (NCPTM-CIL), a method that dynamically adjusts the feature space to conform to the elegant NC structure, thereby enhancing the continual learning process. Extensive experiments demonstrate that NCPTM-CIL outperforms state-of-the-art methods across four benchmark datasets. Notably, when initialized with ViT-B/16-IN1K, NCPTM-CIL surpasses the runner-up method by 6.73% on VTAB, 1.25% on CIFAR-100, and 2.5% on OmniBenchmark.

Paper Structure

This paper contains 21 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a): The $\mathcal{NC}_\mathbf{2}$ metric with respect to the number of epochs during fine-tuning for ViT-B/16-IN1K and ViT-B/16-IN21K. (b): The $\mathcal{NC}_\mathbf{2}$ values of different PTM-CIL methods on the VTAB dataset during the incremental learning phase, where VTAB is divided into five incremental learning tasks, each containing 10 classes.
  • Figure 2: Illustration of NCPTM-CIL. Phase I: Base Learning. We utilize the initial task dataset $\mathcal{D}_1$ to fine-tune the ViT pre-trained model using the VPT Deep jia2022visual and AdaptFormer chen2022adaptformer methods. Phase II: Incremental Learning. We freeze the fine-tuned ViT pre-trained model to extract the class mean features and store them in Class-Mean Features Pools. Subsequently, these class mean features are aligned and mapped to a Dynamic ETF Classifier through an Alignment layer. It is worth noting that the number of vertices in the Dynamic ETF Classifier is adjusted according to the number of currently learned classes. For example, if the total number of learned classes in Task $N-1$ is 3, then the number of classifier weight vectors is 3. If the total number of learned classes in Task $N$ is 4, then the number of classifier weight vectors is 4.
  • Figure 3: Description of the gradient of PAP Loss with respect to $\hat{c}_1$. For PAP Loss, the gradient ($-\frac{\partial L}{\partial\hat{c}_1}$) pulls $\hat{c}_{1}$ towards $\hat{\boldsymbol{w}}_{i}$, while the gradients from ($-\frac{\partial L}{\partial\hat{c}_1}$) and ($-\frac{\partial L}{\partial\hat{c}_1}$) act as a repulsive force, pushing $\hat{c}_{1}$ away from $\hat{\boldsymbol{w}}_{i}$ (the red dashed line indicates the direction of the gradient).
  • Figure 4: All methods are initialized with ViT-B/16-IN1K. We annotate the relative improvement of NCPTM-CIL over the runner-up method with numerical values at the final incremental stage.
  • Figure 5: The relationship between the performance of existing PTM-based CIL approaches and their feature's $\mathcal{NC}_\mathbf{2}$.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1: Simplex ETF