Table of Contents
Fetching ...

AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection

Saleh Momeni, Changnan Xiao, Bing Liu

TL;DR

AnaCP tackles class-incremental learning by enabling analytic, gradient-free feature adaptation of fixed PTM features through a contrastive projection layer. It combines positive alignment via prototype regression with negative repulsion via target-prototype separation, followed by an analytic classifier and pseudo-replay with a shared covariance to maintain CF-resilience. The approach yields accuracies close to or surpassing joint training on several benchmarks, with strong memory-time efficiency despite a larger parameter footprint, and remains robust to catastrophic forgetting. Overall, AnaCP presents a scalable, CF-free pathway to leverage powerful PTMs for continual learning, with room to improve when weaker PTMs are used and potential extension to task- or domain-incremental settings.

Abstract

This paper studies the problem of class-incremental learning (CIL), a core setting within continual learning where a model learns a sequence of tasks, each containing a distinct set of classes. Traditional CIL methods, which do not leverage pre-trained models (PTMs), suffer from catastrophic forgetting (CF) due to the need to incrementally learn both feature representations and the classifier. The integration of PTMs into CIL has recently led to efficient approaches that treat the PTM as a fixed feature extractor combined with analytic classifiers, achieving state-of-the-art performance. However, they still face a major limitation: the inability to continually adapt feature representations to best suit the CIL tasks, leading to suboptimal performance. To address this, we propose AnaCP (Analytic Contrastive Projection), a novel method that preserves the efficiency of analytic classifiers while enabling incremental feature adaptation without gradient-based training, thereby eliminating the CF caused by gradient updates. Our experiments show that AnaCP not only outperforms existing baselines but also achieves the accuracy level of joint training, which is regarded as the upper bound of CIL.

AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection

TL;DR

AnaCP tackles class-incremental learning by enabling analytic, gradient-free feature adaptation of fixed PTM features through a contrastive projection layer. It combines positive alignment via prototype regression with negative repulsion via target-prototype separation, followed by an analytic classifier and pseudo-replay with a shared covariance to maintain CF-resilience. The approach yields accuracies close to or surpassing joint training on several benchmarks, with strong memory-time efficiency despite a larger parameter footprint, and remains robust to catastrophic forgetting. Overall, AnaCP presents a scalable, CF-free pathway to leverage powerful PTMs for continual learning, with room to improve when weaker PTMs are used and potential extension to task- or domain-incremental settings.

Abstract

This paper studies the problem of class-incremental learning (CIL), a core setting within continual learning where a model learns a sequence of tasks, each containing a distinct set of classes. Traditional CIL methods, which do not leverage pre-trained models (PTMs), suffer from catastrophic forgetting (CF) due to the need to incrementally learn both feature representations and the classifier. The integration of PTMs into CIL has recently led to efficient approaches that treat the PTM as a fixed feature extractor combined with analytic classifiers, achieving state-of-the-art performance. However, they still face a major limitation: the inability to continually adapt feature representations to best suit the CIL tasks, leading to suboptimal performance. To address this, we propose AnaCP (Analytic Contrastive Projection), a novel method that preserves the efficiency of analytic classifiers while enabling incremental feature adaptation without gradient-based training, thereby eliminating the CF caused by gradient updates. Our experiments show that AnaCP not only outperforms existing baselines but also achieves the accuracy level of joint training, which is regarded as the upper bound of CIL.

Paper Structure

This paper contains 16 sections, 2 theorems, 28 equations, 2 figures, 3 tables.

Key Result

Lemma 4.1

Denote $w_1, \dots, w_C \in \mathbb{R}^C$ as arbitrary vectors, and $e_1, \dots, e_C \in \mathbb{R}^C$ as a set of orthogonal bases of $\mathbb{R}^C$. Denote $\langle x, y \rangle = x^\top y$ as the inner product and $\theta(x, y) = \frac{x^\top y}{||x|| \cdot ||y||}$ as the cosine similarity. There s.t.

Figures (2)

  • Figure 1: (Left) Architecture of AnaCP: The random projection layer uses fixed, randomly assigned weights, while the contrastive projection layer weights are computed analytically. Compared to the ELM architecture, AnaCP introduces an additional contrastive projection layer that adapts the feature representations. (Right) $t$-SNE visualization of the input features (first five classes of ImageNet-R) and their enhancement after contrastive projection using DINO-v2 as the PTM. Random features are generally more separable than input features due to their higher dimensionality, even though this is not easily observable in the 2D t-SNE map.
  • Figure 2: Accuracy matrix $A[t][i]$ on CIFAR-100 with 10 task splits using DINO-v2 as the backbone under the TIL setting. Each entry shows the accuracy on task $i$ after training on the first $t$ tasks.

Theorems & Definitions (3)

  • Lemma 4.1
  • Lemma A.1: Lemma \ref{['lemma: a_proper_addition']}
  • proof