Table of Contents
Fetching ...

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

Abhinav Jain, Swarat Chaudhuri, Thomas Reps, Chris Jermaine

TL;DR

Low-Rank Prompt Adaptation (LoPA) is proposed, a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LoPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LoPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

TL;DR

Low-Rank Prompt Adaptation (LoPA) is proposed, a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LoPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LoPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.
Paper Structure (14 sections, 5 equations, 8 figures, 4 tables)

This paper contains 14 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A schematic illustrating how typical PEFT methods like LoRA achieve personalization of a foundation model for multiple tasks, such as Yes/No text classification or code completion, during inference.
  • Figure 2: An illustration of LoPA. No task-specific adapters need to be stored on the server. $|$ represents the concatenation of the soft prompt $\textbf{Z}$ and the input prompt $\textbf{X}_e$ i.e. $\textbf{X}=\textrm{concat}(\textbf{Z}|\textbf{X}_e)$
  • Figure 3: Performance comparison of baselines as a function of $m$ on (a)-(c) GLUE benchmark and (d) CruxEval-O (with DeepseekCoder-1.3B as FM). Tunable parameters shown relative to the method with the most. Higher performance and fewer parameters indicate better results.
  • Figure 4: Performance of LoPA as a function of rank shown for $m=10$. (a) GLUE Benchmarks and (b) CruxEval tasks $(I, O)$ where ds-1.3 denotes DeepseekCoder-1.3B and phi-2 denotes Phi2-2.7B models. Higher performance and fewer tunable parameters indicate better results.
  • Figure 5: Ablation for Encoder in LoPA with DeepseekCoder-1.3B as the foundation model.
  • ...and 3 more figures