Table of Contents
Fetching ...

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Seoyoung Park, Haemin Lee, Hankook Lee

Abstract

Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches have employed prompt selection, an adaptive strategy that selects prompts from a pool based on input signals. However, we observe that such selection strategies often fail to select appropriate prompts, yielding suboptimal results despite additional training of key parameters. Motivated by this observation, we propose a simple yet effective SinglePrompt that eliminates the need for prompt selection and focuses on classifier optimization. Specifically, we simply (i) inject a single prompt into each self-attention block, (ii) employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights, and (iii) mask logits for unexposed classes in the current minibatch. With this simple task-free design, our framework achieves state-of-the-art performance across various online continual learning benchmarks. Source code is available at https://github.com/efficient-learning-lab/SinglePrompt.

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Abstract

Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches have employed prompt selection, an adaptive strategy that selects prompts from a pool based on input signals. However, we observe that such selection strategies often fail to select appropriate prompts, yielding suboptimal results despite additional training of key parameters. Motivated by this observation, we propose a simple yet effective SinglePrompt that eliminates the need for prompt selection and focuses on classifier optimization. Specifically, we simply (i) inject a single prompt into each self-attention block, (ii) employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights, and (iii) mask logits for unexposed classes in the current minibatch. With this simple task-free design, our framework achieves state-of-the-art performance across various online continual learning benchmarks. Source code is available at https://github.com/efficient-learning-lab/SinglePrompt.

Paper Structure

This paper contains 20 sections, 8 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: An overview of the proposed SinglePrompt. When minibatch $\mathcal{B}_t=\{(\mathbf{x}_t^{(i)}, y_t^{(i)})\}_{i=1}^{B_t}$ is provided, it passes though a pretrained Vision Transformer encoder. At the $i$-th self-attention block $f_i$, the input sequence $\mathbf{h}_{i-1}$ is given, and during the attention operation the learnable prompts $\mathbf{p}_i^k$ and $\mathbf{p}_i^v$ are prepended to the key and value, respectively. Only the class token from the encoder's output sequence is used as the final representation. The cosine similarity between this representation and the class prototypes is computed and used as the logit values for prediction. The logit values corresponding to labels not exposed in the current minibatch are masked out before computing the cross-entropy loss.
  • Figure 2: Histograms of prompt selection counts per class on task-free continual learning using CIFAR100 krizhevsky2009learning. In all histograms, the x-axis represents 100 class IDs and the y-axis denotes the number of times each prompt is selected. (a) L2P wang2022learning is evaluated on the 10-task class-incremental setting and the model selects the top-5 prompts per input from a prompt pool. (b) MVP moon2023online and (c) MISA kang2025advancing are evaluated on the 5-task Si-Blurry setting. Note that both methods select only one prompt per input image unlike L2P.
  • Figure 3: Prompt selection failures in task-based methods on CIFAR100 krizhevsky2009learning. (a) Task identification accuracy (%) via prompt selection. The x-axis represents the 100 class IDs, sorted in ascending order of their task identification accuracy, which is shown on the y-axis as the proportion of correct prompt selection. (b) The average and standard variation of cosine similarities computed from the 7-th layer. The x-axis represents the task ID of input sample and the y-axis indicates the average cosine similarity between each task's samples and their assigned keys. Ideally, all value should be close to 1, but most are near 0. Note that consistent results are observed across other layers as well.
  • Figure 4: Visualization of the L2 norms of the weights for each class of the linear classifier, with blurry classes excluded. The x-axis shows class IDs in the order the model observes them.
  • Figure 5: Ablation study on Prompt length. The x-axis denotes the value of $M$.
  • ...and 4 more figures