Table of Contents
Fetching ...

Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product

Pengxiang Lan, Haoyu Xu, Enneng Yang, Yuliang Liang, Guibing Guo, Jianzhe Zhao, Xingwei Wang

TL;DR

This work tackles two key bottlenecks in prompt tuning: the lack of intrinsic semantic coupling among soft prompt tokens and the high cost of long prompts. It introduces LAMP, a Low-parameters Prompt Tuning framework that decomposes the soft prompt via Truncated SVD and enriches token interactions with a compressed outer product, followed by average pooling to maintain efficiency. By training only the low-rank components $\mathbf{U}_{[:r]}$, $\mathbf{Q}_{[:r]}$, and $\mathbf{V}_{[:r]}$ while freezing the backbone, LAMP achieves strong performance gains across multiple model scales and datasets with substantially fewer trainable parameters. Across eight SuperGLUE/GLUE tasks and model sizes including $\text{T5-Small}$, $\text{T5-Base}$, $\text{T5-Large}$, $\text{T5-11B}$, and $\text{Llama2-7B}$, LAMP outperforms state-of-the-art PT and LoRA baselines, reduces memory and compute, and provides clearer interpretability of the prompt representations, underscoring its practical impact for scalable, efficient prompt-based learning in NLP.

Abstract

Prompt tuning (PT) offers a cost-effective alternative to fine-tuning large-scale pre-trained language models (PLMs), requiring only a few parameters in soft prompt tokens added before the input text. However, existing PT approaches face two significant issues: (i) They overlook intrinsic semantic associations between soft prompt tokens, leading to high discreteness and limited interactions, thus reducing the model's comprehension and effectiveness in complex tasks. (ii) Due to the complexity of downstream tasks, long soft prompt is necessitated to improve performance, but prompt length correlates positively with memory usage and computational costs. Achieving high efficiency and performance remains an ongoing challenge. To address these issues, we propose a novel Low-parameters prompt tuning (LAMP) method, which leverages prompt decomposition and compressed outer product. Specifically, the prompt decomposition module employs Truncated SVD to reduce training parameters and significantly lower the dimensionality of the soft prompt parameter space. It then utilizes a compressed outer product module to facilitate multiple interactions among prompt tokens, exploring their intrinsic associations to enhance knowledge representation. Finally, LAMP uses average pooling to reduce memory usage and training/inference time. Extensive experiments across six architectures and eight datasets demonstrate that LAMP outperforms state-of-the-art PT-based and LoRA-based methods in performance and efficiency.

Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product

TL;DR

This work tackles two key bottlenecks in prompt tuning: the lack of intrinsic semantic coupling among soft prompt tokens and the high cost of long prompts. It introduces LAMP, a Low-parameters Prompt Tuning framework that decomposes the soft prompt via Truncated SVD and enriches token interactions with a compressed outer product, followed by average pooling to maintain efficiency. By training only the low-rank components , , and while freezing the backbone, LAMP achieves strong performance gains across multiple model scales and datasets with substantially fewer trainable parameters. Across eight SuperGLUE/GLUE tasks and model sizes including , , , , and , LAMP outperforms state-of-the-art PT and LoRA baselines, reduces memory and compute, and provides clearer interpretability of the prompt representations, underscoring its practical impact for scalable, efficient prompt-based learning in NLP.

Abstract

Prompt tuning (PT) offers a cost-effective alternative to fine-tuning large-scale pre-trained language models (PLMs), requiring only a few parameters in soft prompt tokens added before the input text. However, existing PT approaches face two significant issues: (i) They overlook intrinsic semantic associations between soft prompt tokens, leading to high discreteness and limited interactions, thus reducing the model's comprehension and effectiveness in complex tasks. (ii) Due to the complexity of downstream tasks, long soft prompt is necessitated to improve performance, but prompt length correlates positively with memory usage and computational costs. Achieving high efficiency and performance remains an ongoing challenge. To address these issues, we propose a novel Low-parameters prompt tuning (LAMP) method, which leverages prompt decomposition and compressed outer product. Specifically, the prompt decomposition module employs Truncated SVD to reduce training parameters and significantly lower the dimensionality of the soft prompt parameter space. It then utilizes a compressed outer product module to facilitate multiple interactions among prompt tokens, exploring their intrinsic associations to enhance knowledge representation. Finally, LAMP uses average pooling to reduce memory usage and training/inference time. Extensive experiments across six architectures and eight datasets demonstrate that LAMP outperforms state-of-the-art PT-based and LoRA-based methods in performance and efficiency.

Paper Structure

This paper contains 31 sections, 8 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: (a) and (b) show the t-SNE clustering visualizations of the original prompt tuning prompts after training on the MultiRC and COPA datasets using the T5-Base model. Source prompt tokens are initialized from sampled vocabulary and length is set to 100.
  • Figure 2: (a) Average performance on the T5 models across the SuperGLUE benchmark. (b) Impact of prompt length on performance and trainable parameters in the WiC dataset of the SuperGLUE benchmark.
  • Figure 3: (a) Conventional prompt tuning lester2021power. (b) The overview of our proposed LAMP. It decomposes the vanilla prompt to construct a new low-dimensional prompt, captures the intrinsic semantic associations between prompt tokens, and finally reduces computational costs through average pooling.
  • Figure 4: (a) and (b), the performance of all baselines with the number of inherent ranks $r \in\{4, 6, 8, 10, 12, 20\}$ on the SuperGLUE benchmark. (c) and (d), the performance of different baselines varies with the prompt length $l \in \{20, 100, 200\}$. All results represent the average of three runs conducted with a different random seed.
  • Figure 5: (a) and (b), the performance changes of different methods at various datasets on the T5-11B and Llama2-7B. (c), the variation of training time and performance in different average pooling blocks $p$.
  • ...and 6 more figures