Table of Contents
Fetching ...

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

Pengxiang Lan, Enneng Yang, Yuting Liu, Guibing Guo, Jianzhe Zhao, Xingwei Wang

TL;DR

This work targets the efficiency-accuracy dilemma in prompt tuning for large language models. It introduces Efficient Prompt Tuning (EPT), which decomposes the soft prompt into a short prompt plus low-rank components, then fuses semantic knowledge via a fusion module and reweights the prompt across multiple subspaces with a gating network. A reconstructed joint prompt replaces the original, and 4-bit quantization reduces memory usage while gradients remain on the prompt parameters. Empirical results on GLUE and SuperGLUE show consistent performance gains with significantly reduced training time across a range of model scales, highlighting the method's robustness and practical applicability. The approach advances parameter-efficient fine-tuning by addressing both efficiency and task-wise adaptability, with potential for broader multi-task and larger-scale deployment.

Abstract

Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time. (ii)The performance may not be consistent when adapting to different downstream tasks. We attribute it to the same embedding space but responsible for different requirements of downstream tasks. To address these issues, we propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion. Specifically, it decomposes a given soft prompt into a shorter prompt and two low-rank matrices, significantly reducing the training time. Accuracy is also enhanced by leveraging low-rank matrices and the short prompt as additional knowledge sources to enrich the semantics of the original short prompt. In addition, we project the soft prompt into multiple subspaces to improve the performance consistency, and then adaptively learn the combination weights of different spaces through a gating network. Experiments on 13 natural language processing downstream tasks show that our method significantly and consistently outperforms 11 comparison methods with the relative percentage of improvements up to 12.9%, and training time decreased by 14%.

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

TL;DR

This work targets the efficiency-accuracy dilemma in prompt tuning for large language models. It introduces Efficient Prompt Tuning (EPT), which decomposes the soft prompt into a short prompt plus low-rank components, then fuses semantic knowledge via a fusion module and reweights the prompt across multiple subspaces with a gating network. A reconstructed joint prompt replaces the original, and 4-bit quantization reduces memory usage while gradients remain on the prompt parameters. Empirical results on GLUE and SuperGLUE show consistent performance gains with significantly reduced training time across a range of model scales, highlighting the method's robustness and practical applicability. The approach advances parameter-efficient fine-tuning by addressing both efficiency and task-wise adaptability, with potential for broader multi-task and larger-scale deployment.

Abstract

Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time. (ii)The performance may not be consistent when adapting to different downstream tasks. We attribute it to the same embedding space but responsible for different requirements of downstream tasks. To address these issues, we propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion. Specifically, it decomposes a given soft prompt into a shorter prompt and two low-rank matrices, significantly reducing the training time. Accuracy is also enhanced by leveraging low-rank matrices and the short prompt as additional knowledge sources to enrich the semantics of the original short prompt. In addition, we project the soft prompt into multiple subspaces to improve the performance consistency, and then adaptively learn the combination weights of different spaces through a gating network. Experiments on 13 natural language processing downstream tasks show that our method significantly and consistently outperforms 11 comparison methods with the relative percentage of improvements up to 12.9%, and training time decreased by 14%.
Paper Structure (35 sections, 9 equations, 8 figures, 5 tables)

This paper contains 35 sections, 9 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Average performance ($y$-axis) against the number of trainable parameters ($x$-axis) on the GLUE and SuperGLUE benchmarks. We utilize the T5-Base for all models.
  • Figure 2: The overview of the EPT model. The whole soft prompt is decomposed into a short prompt and two low-rank matrices. Low-rank matrices are multiplied and added element-wise to the frozen input text embedding. The Multi-Space Projection Module maps the short prompt to multiple subspaces, addressing diverse downstream task requirements, while the Prompt Fusion module enhances its semantic knowledge. Finally, EPT generates a joint prompt representation to supersede the original prompt. The new prompt and the updated input text embedding are concatenated to input into the PLM.
  • Figure 3: The performance changes of EPT(Ours), DEPT, and PT at different datasets on the T5-11B and Llama2-7B.
  • Figure 4: On the GLUE benchmark, (a) The performance changes of EPT(Ours), MPT, and PT at different K-shot. (b) Comparison of training time consumption and the performance changes (EPT, DEPT, and PT) according to different lengths of the short prompt in EPT and DEPT.
  • Figure 5: Performance of the number of spaces in the Multi-Space Projection module on the GLUE and SuperGLUE benchmarks.
  • ...and 3 more figures