Table of Contents
Fetching ...

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Quyen Tran, Hoang Phan, Lam Tran, Khoat Than, Toan Tran, Dinh Phung, Trung Le

TL;DR

This work introduces a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features.

Abstract

Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

TL;DR

This work introduces a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features.

Abstract

Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.
Paper Structure (26 sections, 10 equations, 5 figures, 14 tables, 1 algorithm)

This paper contains 26 sections, 10 equations, 5 figures, 14 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visualization for the mismatch in prompt representation and semantic drift problems (Best viewed in color). When training the task $t$, each example $\mathbf{x}^{tr}_t$ of this task has a weighted prompt $\mathbf{P}_{\mathbf{x}^{tr}_t}$ that only depends on the existing prompts $\mathbf{P}^{1:t}$. Meanwhile the weighted prompt $\mathbf{P}_{\mathbf{x}^{te}_t}$ of each testing example $\mathbf{x}^{te}_t$ of the same task depends on all prompts $\mathbf{P}^{1:T}$. In case $\mathbf{x}^{te}_t = \mathbf{x}^{tr}_t = \mathbf{x}$, the weighted prompt $\mathbf{P}_\mathbf{x}$ gradually involves more prompts and is shifted as $T$ increases.
  • Figure 2: Average weight scores of our method when doing query-key matching (S-CIFAR-100)
  • Figure 3: Rate of correct triggering of KOPPA on S-CIFAR-100 and S-Imagenet-R-5.
  • Figure 4: Training framework of KOPPA. Better viewd in color.
  • Figure 5: Average Accuracy of CODA and KOPPA with varied prompt pool sizes and prompt lengths.