Table of Contents
Fetching ...

CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning

Xinyuan Gao, Songlin Dong, Yuhang He, Xing Wei, Yihong Gong

TL;DR

CEAT tackles Non-Exemplar Class Incremental Learning by freezing a Vision Transformer backbone while learning task-specific ex-fusion layers that are later absorbed losslessly to keep model size constant. It combines a prototype-based batch interpolation mechanism and a prototype contrastive loss to maintain clear separation between old and new classes without using old samples. A lossless absorption procedure enables continual learning without growing parameters, making the approach practical for edge devices. Empirical results on CIFAR-100, TinyImageNet, and ImageNet-Subset show substantial gains over prior NECIL methods, validating the approach's effectiveness and scalability.

Abstract

In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. Experience-Replay methods store a subset of the old images for joint training. In the scenario of more strict privacy protection, storing the old images becomes infeasible, which leads to a more severe plasticity-stability dilemma and classifier bias. To meet the above challenges, we propose a new architecture, named continual expansion and absorption transformer~(CEAT). The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters. After the task ends, we losslessly absorb the extended parameters into the backbone to ensure that the number of parameters remains constant. To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space. Besides, to address the classifier bias towards the new classes, we propose a novel approach to generate the pseudo-features to correct the classifier. We experiment with our methods on three standard Non-Exemplar Class-Incremental Learning~(NECIL) benchmarks. Extensive experiments demonstrate that our model gets a significant improvement compared with the previous works and achieves 5.38%, 5.20%, and 4.92% improvement on CIFAR-100, TinyImageNet, and ImageNet-Subset.

CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning

TL;DR

CEAT tackles Non-Exemplar Class Incremental Learning by freezing a Vision Transformer backbone while learning task-specific ex-fusion layers that are later absorbed losslessly to keep model size constant. It combines a prototype-based batch interpolation mechanism and a prototype contrastive loss to maintain clear separation between old and new classes without using old samples. A lossless absorption procedure enables continual learning without growing parameters, making the approach practical for edge devices. Empirical results on CIFAR-100, TinyImageNet, and ImageNet-Subset show substantial gains over prior NECIL methods, validating the approach's effectiveness and scalability.

Abstract

In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. Experience-Replay methods store a subset of the old images for joint training. In the scenario of more strict privacy protection, storing the old images becomes infeasible, which leads to a more severe plasticity-stability dilemma and classifier bias. To meet the above challenges, we propose a new architecture, named continual expansion and absorption transformer~(CEAT). The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters. After the task ends, we losslessly absorb the extended parameters into the backbone to ensure that the number of parameters remains constant. To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space. Besides, to address the classifier bias towards the new classes, we propose a novel approach to generate the pseudo-features to correct the classifier. We experiment with our methods on three standard Non-Exemplar Class-Incremental Learning~(NECIL) benchmarks. Extensive experiments demonstrate that our model gets a significant improvement compared with the previous works and achieves 5.38%, 5.20%, and 4.92% improvement on CIFAR-100, TinyImageNet, and ImageNet-Subset.
Paper Structure (33 sections, 12 equations, 4 figures, 4 tables)

This paper contains 33 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The overall process of our continual expansion and absorption. Each ViT layer consists of two LayerNorm layers, one incremental MHSA (Inc-MHSA), and one incremental MLP (Inc-MLP). In task $t > 0$, the backbone $F_{\theta}^{t-1}$ is frozen, and the ex-fusion parameters $A_{\psi}^{t}$ are trainable to learn the new task. After the task ends, parameters $A_{\psi}^{t}$ are absorbed losslessly into the backbone for the next task.
  • Figure 2: The overall structure of our model. Our model consists of two classical SABs (Self-Attention Blocks), four Inc-SABs, which have expanded parameters, and a classifier. The current images pass the feature extractor and the resulting features concat with the pseudo-features to balance the classifier. Also, the concat features are used to compute the PCL (prototype contrastive loss) to reduce the overlap among the old and new classes.
  • Figure 3: The visualization of the old feature using the final model on CIFAR-100 5 steps. We utilize the preceding images from the first 10 categories to evaluate the capability of the final model in preserving the distinguishing features of initial class categories.
  • Figure 4: The performance on CIFAR-100 5 steps. Note that at the first step before the continual process begins (represented by a dotted rectangle), our model has performance comparable to other last methods zhu2021prototypezhu2022self. This phenomenon illustrates that our method improves performance by solving the plasticity-stability dilemma.