Table of Contents
Fetching ...

Adapt before Continual Learning

Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun

TL;DR

This work tackles the stability-plasticity trade-off in continual learning with pre-trained models by introducing ACL, a plug-and-play adaptation phase that realigns the PTM feature space before learning each new task. During adaptation, the backbone is fine-tuned to move embeddings toward their original class prototypes while distancing from other prototypes, formalized by the ACL loss $\mathcal{L}_{\text{ACL}}(x_i,y_i)=-\log\frac{\exp(\cos(\phi^{*}(x_i),p_{y_i})/\tau)}{\sum_j \exp(\cos(\phi^{*}(x_i),p_j)/\tau)}$, with prototypes $p_c=\mathbb{E}_{(x,y)\in\mathcal{D}_k,y=c}[\phi(x)]$ and unit-normalized embeddings. The framework provides theoretical guarantees that minimizing the ACL loss promotes plasticity (reduces a bound on current-task error) while implicitly regularizing feature drift to preserve stability. Empirically, ACL improves LA and AIA across multiple CL baselines on domain-shift benchmarks like ImageNet-R and ImageNet-A, and is effective across backbones (e.g., ViT-B/16-IN1K, ViT-B/16-IN21K) and even CLIP setups, albeit with some additional GPU memory overhead. The results demonstrate ACL’s broad applicability as a general, plug-in enhancement to PTM-based continual learning pipelines, advancing the practical utility of continual adaptation in real-world, non-stationary environments.

Abstract

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL} process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods. Code is available at https://github.com/byyx666/ACL_code.

Adapt before Continual Learning

TL;DR

This work tackles the stability-plasticity trade-off in continual learning with pre-trained models by introducing ACL, a plug-and-play adaptation phase that realigns the PTM feature space before learning each new task. During adaptation, the backbone is fine-tuned to move embeddings toward their original class prototypes while distancing from other prototypes, formalized by the ACL loss , with prototypes and unit-normalized embeddings. The framework provides theoretical guarantees that minimizing the ACL loss promotes plasticity (reduces a bound on current-task error) while implicitly regularizing feature drift to preserve stability. Empirically, ACL improves LA and AIA across multiple CL baselines on domain-shift benchmarks like ImageNet-R and ImageNet-A, and is effective across backbones (e.g., ViT-B/16-IN1K, ViT-B/16-IN21K) and even CLIP setups, albeit with some additional GPU memory overhead. The results demonstrate ACL’s broad applicability as a general, plug-in enhancement to PTM-based continual learning pipelines, advancing the practical utility of continual adaptation in real-world, non-stationary environments.

Abstract

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL} process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods. Code is available at https://github.com/byyx666/ACL_code.

Paper Structure

This paper contains 26 sections, 6 theorems, 20 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

For the adapted model $\phi^{*}(\cdot)$, the probability of misclassifying a sample $x_i$ is upper-bounded by the expected ACL loss:

Figures (6)

  • Figure 1: Performance comparison on ImageNet-A-Inc20 between the frozen PTM and the PTM adapted using our ACL. Plasticity: the average of the optimal accuracy of each task during CL; Stability: the average forgetting across previous tasks after learning the final task; Overall CL performance: the average accuracy across all tasks after learning the final task.
  • Figure 2: Illustration of ACL. ACL comprises two phases per task: (1) Adapting the PTM weights to enhance feature discriminability for the current task, and (2) Learning classification using the frozen adapted PTM and trainable modules.
  • Figure 3: Performance of original ACL and its ablation variants, including (1) using standard classification loss for adaptation, (2) adapting for the first task only, and (3) adapting lightweight modules only with frozen backbone.
  • Figure 4: Performance with different adaptation epochs.
  • Figure 5: (a-b) Visualization of 2D feature representations using t-SNE. (c) Grad-CAM visualization, where important regions are highlighted with warm colors.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Proposition 1
  • proof
  • Lemma 1
  • Lemma 2
  • Proposition 2
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof