Table of Contents
Fetching ...

SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models

Linglan Zhao, Xuerui Zhang, Ke Yan, Shouhong Ding, Weiran Huang

TL;DR

To inherit general knowledge from foundation models, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting and an entropy-based aggregation strategy is introduced to dynamically utilize the complementarity in the slow and fast learners.

Abstract

Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems using these foundation models, rather than learning from scratch. Existing works often view PTMs as a strong initial point and directly apply parameter-efficient tuning (PET) in the first session for adapting to downstream tasks. In the following sessions, most methods freeze model parameters for tackling forgetting issues. However, applying PET directly to downstream data cannot fully explore the inherent knowledge in PTMs. Additionally, freezing the parameters in incremental sessions hinders models' plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. In particular, to inherit general knowledge from foundation models, we include a transfer loss function by measuring the correlation between the PTM and the PET-applied model. After calibrating in the first session, the slow efficient tuning parameters can capture more informative features, improving generalization to incoming classes. Moreover, to further incorporate novel concepts, we strike a balance between stability and plasticity by fixing slow efficient tuning parameters and continuously updating the fast ones. Specifically, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting. During inference, we introduce an entropy-based aggregation strategy to dynamically utilize the complementarity in the slow and fast learners. Extensive experiments on seven benchmark datasets verify the effectiveness of our method by significantly surpassing the state-of-the-art.

SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models

TL;DR

To inherit general knowledge from foundation models, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting and an entropy-based aggregation strategy is introduced to dynamically utilize the complementarity in the slow and fast learners.

Abstract

Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems using these foundation models, rather than learning from scratch. Existing works often view PTMs as a strong initial point and directly apply parameter-efficient tuning (PET) in the first session for adapting to downstream tasks. In the following sessions, most methods freeze model parameters for tackling forgetting issues. However, applying PET directly to downstream data cannot fully explore the inherent knowledge in PTMs. Additionally, freezing the parameters in incremental sessions hinders models' plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. In particular, to inherit general knowledge from foundation models, we include a transfer loss function by measuring the correlation between the PTM and the PET-applied model. After calibrating in the first session, the slow efficient tuning parameters can capture more informative features, improving generalization to incoming classes. Moreover, to further incorporate novel concepts, we strike a balance between stability and plasticity by fixing slow efficient tuning parameters and continuously updating the fast ones. Specifically, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting. During inference, we introduce an entropy-based aggregation strategy to dynamically utilize the complementarity in the slow and fast learners. Extensive experiments on seven benchmark datasets verify the effectiveness of our method by significantly surpassing the state-of-the-art.

Paper Structure

This paper contains 20 sections, 14 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparisons of (a) prevailing PTM-based CL methods adammcdonnell2024ranpacread_between_layers and our Slow And Fast parameter-Efficient tuning (SAFE). The right part (b) illustrates several parameter-efficient tuning (PET) blocks: Adapter adaptformer, Scale & Shift (SSF) ssf, and Visual Prompt Tuning (VPT) VPT.
  • Figure 2: An overview of our SAFE framework. In the first session, PTM transfers knowledge to the slow learner for better generalization. In sessions $t>1$, the fast learner is guided by the slow learner for enhanced plasticity. During inference, robust predictions are made by dynamic aggregation.
  • Figure 3: Comparisons with T-SNE visualization.
  • Figure 4: Validations on the necessity of the aggregation on IN-R. We provide detailed classification accuracy of test samples from different sessions. Results of the slow learner, the fast learner and SAFE are presented for comparison.
  • Figure 5: Aggregation weights for the slow learner and fast learner on IN-R.
  • ...and 3 more figures