Table of Contents
Fetching ...

Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need

Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella

TL;DR

The paper interrogates the widespread use of prompt tuning as a PEFT choice in continual learning with pretrained transformers and demonstrates that LoRA-based PEFT variants consistently outperform prompt-based approaches across domain- and class-incremental benchmarks. It introduces drop-in LoRA-based variants of two prominent CL methods, S-Prompts (S-LoRA) and Learning to Prompt (L2L), and shows substantial accuracy gains with minimal inference overhead. Through comprehensive experiments on diverse datasets (CORe50, DomainNet, Split CIFAR-100, Tiny ImageNet) and careful ablations, the authors argue that prompt tuning is not inherently suited to continual learning and advocate adopting LoRA for practical CL deployments. The work emphasizes the importance of ablations of architectural choices, the potential for improved real-world impact, and the need for broader exploration of PEFT techniques beyond prompt tuning in CL.

Abstract

Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this research and find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system. To illustrate this, we replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts. These variants consistently achieve higher accuracy across a wide range of domain-incremental and class-incremental benchmarks, while being competitive in inference speed. Our work highlights a crucial argument: unexamined choices can hinder progress in the field, and rigorous ablations, such as the PEFT method, are required to drive meaningful adoption of CL techniques in real-world applications.

Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need

TL;DR

The paper interrogates the widespread use of prompt tuning as a PEFT choice in continual learning with pretrained transformers and demonstrates that LoRA-based PEFT variants consistently outperform prompt-based approaches across domain- and class-incremental benchmarks. It introduces drop-in LoRA-based variants of two prominent CL methods, S-Prompts (S-LoRA) and Learning to Prompt (L2L), and shows substantial accuracy gains with minimal inference overhead. Through comprehensive experiments on diverse datasets (CORe50, DomainNet, Split CIFAR-100, Tiny ImageNet) and careful ablations, the authors argue that prompt tuning is not inherently suited to continual learning and advocate adopting LoRA for practical CL deployments. The work emphasizes the importance of ablations of architectural choices, the potential for improved real-world impact, and the need for broader exploration of PEFT techniques beyond prompt tuning in CL.

Abstract

Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this research and find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system. To illustrate this, we replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts. These variants consistently achieve higher accuracy across a wide range of domain-incremental and class-incremental benchmarks, while being competitive in inference speed. Our work highlights a crucial argument: unexamined choices can hinder progress in the field, and rigorous ablations, such as the PEFT method, are required to drive meaningful adoption of CL techniques in real-world applications.
Paper Structure (39 sections, 8 equations, 7 figures, 5 tables)

This paper contains 39 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Is the choice of prompt tuning justified? We show train and test performance of a ViT-B/16 trained on combined training set of all the datasets on Split CIFAR-100 and DomainNet and show the training loss dynamics on the left and the test performance on the right, for fine-tuning (training all parameters), prompt tuning (training prompt tokens and classifier layer), LoRA (training the low rank adapter and classifier layer). It is evident that prompt tuning converges to a higher loss (left), and performs poorly compared to LoRA (right), while being comparably parameter-efficient. Yet, a host of continual learning literature exists on using prompt tuning based techniques. We study if this design choice is justified.
  • Figure 2: Measuring speed as a function of number of trainable parameters for Split CIFAR-100. We see that the prompt-based methods are faster only for a smaller number of trainable parameters.
  • Figure 3: Performance of varying hyperparameters for Split CIFAR-100. We see that while increasing the number of trainable parameters does not improve necessarily result in an improved performance. For L2P, increasing the number of trainable parameters improves performance but does not reach the performance that L2L gets for a much fewer number of parameters. For the S-X family, it is apparent increasing number of parameters of S-LoRA is advantageous for performance, whereas S-LoRA performs poorer.
  • Figure 4: We report the average accuracy obtained after each update. Ranking of LoRA vs. prompting-based methods does not change.
  • Figure 5: S-Prompts (S-Pr) shows no positive change when using the prompts estimated for the first dataset to extract the feature representation (S-Pr++). However, using the LoRA modules of the first dataset (S-Lo++) to extract the features gives a big boost in identifying the right expert model and hence average accuracy.
  • ...and 2 more figures