Table of Contents
Fetching ...

Advancing Prompt-Based Methods for Replay-Independent General Continual Learning

Zhiqi Kang, Liyuan Wang, Xingxing Zhang, Karteek Alahari

TL;DR

This work tackles general continual learning (GCL) under online streams with blurry task boundaries, where prompt-based methods with frozen backbones struggle due to poor initialization and forgetting. It introduces MISA, a plug-in framework combining forgetting-aware Initial Session Adaption (ISA) and non-parametric logit masking to improve prompt learning and stabilize output layers without replay. Through extensive experiments on CIFAR-100, Tiny-ImageNet, and ImageNet-R, MISA achieves substantial gains over strong baselines, including state-of-the-art prompt-based methods, while remaining replay-free and hyperparameter-light. The approach is designed for easy integration with existing prompt-based CL methods, offering robustness, efficiency, and practical applicability for real-world GCL settings.

Abstract

General continual learning (GCL) is a broad concept to describe real-world continual learning (CL) problems, which are often characterized by online data streams without distinct transitions between tasks, i.e., blurry task boundaries. Such requirements result in poor initial performance, limited generalizability, and severe catastrophic forgetting, heavily impacting the effectiveness of mainstream GCL models trained from scratch. While the use of a frozen pretrained backbone with appropriate prompt tuning can partially address these challenges, such prompt-based methods remain suboptimal for CL of remaining tunable parameters on the fly. In this regard, we propose an innovative approach named MISA (Mask and Initial Session Adaption) to advance prompt-based methods in GCL. It includes a forgetting-aware initial session adaption that employs pretraining data to initialize prompt parameters and improve generalizability, as well as a non-parametric logit mask of the output layers to mitigate catastrophic forgetting. Empirical results demonstrate substantial performance gains of our approach compared to recent competitors, especially without a replay buffer (e.g., up to 18.39%, 22.06%, and 11.96% performance lead on CIFAR-100, Tiny-ImageNet, and ImageNet-R, respectively). Moreover, our approach features the plug-in nature for prompt-based methods, independence of replay, ease of implementation, and avoidance of CL-relevant hyperparameters, serving as a strong baseline for GCL research. Our source code is publicly available at https://github.com/kangzhiq/MISA

Advancing Prompt-Based Methods for Replay-Independent General Continual Learning

TL;DR

This work tackles general continual learning (GCL) under online streams with blurry task boundaries, where prompt-based methods with frozen backbones struggle due to poor initialization and forgetting. It introduces MISA, a plug-in framework combining forgetting-aware Initial Session Adaption (ISA) and non-parametric logit masking to improve prompt learning and stabilize output layers without replay. Through extensive experiments on CIFAR-100, Tiny-ImageNet, and ImageNet-R, MISA achieves substantial gains over strong baselines, including state-of-the-art prompt-based methods, while remaining replay-free and hyperparameter-light. The approach is designed for easy integration with existing prompt-based CL methods, offering robustness, efficiency, and practical applicability for real-world GCL settings.

Abstract

General continual learning (GCL) is a broad concept to describe real-world continual learning (CL) problems, which are often characterized by online data streams without distinct transitions between tasks, i.e., blurry task boundaries. Such requirements result in poor initial performance, limited generalizability, and severe catastrophic forgetting, heavily impacting the effectiveness of mainstream GCL models trained from scratch. While the use of a frozen pretrained backbone with appropriate prompt tuning can partially address these challenges, such prompt-based methods remain suboptimal for CL of remaining tunable parameters on the fly. In this regard, we propose an innovative approach named MISA (Mask and Initial Session Adaption) to advance prompt-based methods in GCL. It includes a forgetting-aware initial session adaption that employs pretraining data to initialize prompt parameters and improve generalizability, as well as a non-parametric logit mask of the output layers to mitigate catastrophic forgetting. Empirical results demonstrate substantial performance gains of our approach compared to recent competitors, especially without a replay buffer (e.g., up to 18.39%, 22.06%, and 11.96% performance lead on CIFAR-100, Tiny-ImageNet, and ImageNet-R, respectively). Moreover, our approach features the plug-in nature for prompt-based methods, independence of replay, ease of implementation, and avoidance of CL-relevant hyperparameters, serving as a strong baseline for GCL research. Our source code is publicly available at https://github.com/kangzhiq/MISA

Paper Structure

This paper contains 49 sections, 15 equations, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Problem setup and motivation. Left: illustration of the GCL data stream. Mid: average prediction accuracy at different timesteps in GCL. Right: session 1 accuracy, where we evaluate the retention of knowledge acquired at session 1 after each session. All methods are tested without a replay buffer.
  • Figure 2: An overview of our MISA with a frozen pretrained backbone in GCL. (a) Data in GCL consists of disjoint and blurry classes. (b) Initial session adaption is conducted prior to any CL sessions. Once finished, only the warmed-up prompt parameters are reused for CL. (c) Non-parametric logit mask which retains logits of available classes in a batch or a session.
  • Figure 3: Toy example of implementing SAM with prompt tuning. We performed 2-task CL with randomly sampled 50 classes from ImageNet-1K. The first task is optimized with SAM and the second task uses a standard Adam optimizer. 0.3M, 7M, and 14M represent the number of learnable parameters, and additional parameters are from the last layers of the pretrained backbone $f_r$. SAM works better with more parameters becoming learnable.