Table of Contents
Fetching ...

An Efficient Replay for Class-Incremental Learning with Pre-trained Models

Weimin Yin, Bin Chen adn Chunzhao Xie, Zhenhao Tan

TL;DR

This paper tackles catastrophic forgetting in class-incremental learning when leveraging large pretrained models. It introduces Weight Balancing Replay (WBR), which uses a single memory vector per past task to guide gradient updates and balance old vs. new knowledge without maintaining large replay buffers. By extending the notion of bias from the classifier to arbitrary network weights and computing approximate bias with memory-driven activations under gradient-clip constraints, WBR achieves fast, parameter-efficient forgetting mitigation and strong performance on PTM-based benchmarks. The approach is particularly impactful for real-world deployments where memory and compute are constrained, as it preserves accuracy while dramatically reducing training cost.

Abstract

In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utilizing pre-trained models, achieving significant results. This paper observes that in class-incremental learning, the steady state among the weight guided by each class center is disrupted, which is significantly correlated with catastrophic forgetting. Based on this, we propose a new method to overcoming forgetting . In some cases, by retaining only a single sample unit of each class in memory for replay and applying simple gradient constraints, very good results can be achieved. Experimental results indicate that under the condition of pre-trained models, our method can achieve competitive performance with very low computational cost and by simply using the cross-entropy loss.

An Efficient Replay for Class-Incremental Learning with Pre-trained Models

TL;DR

This paper tackles catastrophic forgetting in class-incremental learning when leveraging large pretrained models. It introduces Weight Balancing Replay (WBR), which uses a single memory vector per past task to guide gradient updates and balance old vs. new knowledge without maintaining large replay buffers. By extending the notion of bias from the classifier to arbitrary network weights and computing approximate bias with memory-driven activations under gradient-clip constraints, WBR achieves fast, parameter-efficient forgetting mitigation and strong performance on PTM-based benchmarks. The approach is particularly impactful for real-world deployments where memory and compute are constrained, as it preserves accuracy while dramatically reducing training cost.

Abstract

In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utilizing pre-trained models, achieving significant results. This paper observes that in class-incremental learning, the steady state among the weight guided by each class center is disrupted, which is significantly correlated with catastrophic forgetting. Based on this, we propose a new method to overcoming forgetting . In some cases, by retaining only a single sample unit of each class in memory for replay and applying simple gradient constraints, very good results can be achieved. Experimental results indicate that under the condition of pre-trained models, our method can achieve competitive performance with very low computational cost and by simply using the cross-entropy loss.
Paper Structure (15 sections, 12 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 12 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the WBR Framework. In contrast to typical methods that use sample buffers to incrementally adjust the entire or partial model weights to avoid catastrophic forgetting, WBR utilizes a single memory vector to represent all samples within a task, guiding the model to prevent forgetting. During the supervised learning of new tasks, WBR balances the proportion of old and new tasks in the model's weights by controlling the magnitude of gradient updates. Experimental results demonstrate that this balance is directly related to the occurrence of catastrophic forgetting. Notably, the maximum memory pool size we use is smaller than that of a single 224x224 image.
  • Figure 2: Illustration of WBR During Training. First, WBR samples and retains memory vectors during the learning of historical tasks based on our proposed bias approximation mechanism. Then, WBR incorporates all historical memory vectors into the training of the new task and optimizes the network using the loss function defined in Equation \ref{['loss']}. The goal is to learn the new task while maintaining a balance in the weights, guiding the network to avoid forgetting.
  • Figure 3: Ablation Study on split MNIST. The left figure shows the impact of network depth on the control of the first layer's memory vector when $\alpha$ and $\beta$ are not set. The middle figure illustrates the effect of different learning rates on mitigating forgetting. The right figure depicts the impact of different learning constraints $\alpha$ and memory constraints $\beta$ on forgetting when the learning rate is set to 0.01 and no hidden layers are used.
  • Figure 4: The two figures on the left show the results of WBR based on ViT-B-16 (pretrained on ImageNet1K) on Split CIFAR-100, while the right figure displays the accuracy of learning new tasks at each stage. Notably, when the hyperparameters are appropriately set, the accuracy of new tasks ceases to improve and instead decreases. Catastrophic forgetting typically occurs when new tasks overfit, causing the performance on old tasks to degrade. This suggests that, at least locally, the balance of weights is closely related to catastrophic forgetting.