SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

MohammadAli SadraeiJavaeri; Ehsaneddin Asgari; Alice Carolyn McHardy; Hamid Reza Rabiee

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

MohammadAli SadraeiJavaeri, Ehsaneddin Asgari, Alice Carolyn McHardy, Hamid Reza Rabiee

TL;DR

The paper tackles the inefficiency of soft prompt tuning in data-limited settings by introducing SuperPos-Prompt, a reparameterization that forms each prompt token as a weighted superposition of multiple pretrained token embeddings, enabling more stable and rapid learning without relying on pre-trained prompts. It additionally shows that removing dropout from the frozen network improves convergence across prompt-tuning methods. Empirically, SuperPos-Prompt yields average gains of $+6.4$ on T5-Small and $+5.0$ on T5-Base across 13 GLUE/SuperGLUE tasks, occasionally surpassing full fine-tuning, and exhibits a plateau around $m=128$ sampled embeddings. The work highlights a practical, data-efficient pathway for PEFT in NLP and suggests future exploration of pre-trained-source prompts and multi-modal extensions.

Abstract

Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this domain: (i) we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

TL;DR

on T5-Small and

on T5-Base across 13 GLUE/SuperGLUE tasks, occasionally surpassing full fine-tuning, and exhibits a plateau around

sampled embeddings. The work highlights a practical, data-efficient pathway for PEFT in NLP and suggests future exploration of pre-trained-source prompts and multi-modal extensions.

Abstract

in T5-Small and

in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.

Paper Structure (15 sections, 5 equations, 2 figures, 3 tables)

This paper contains 15 sections, 5 equations, 2 figures, 3 tables.

Background
Approach
Comparison to similar prompt tuning approaches
Experiments
dataset
Base language model
Ablation Study
Experiment Setup
Results
Conclusions
Limitations
Appendix
T5 original checkpoint
Softmax Effect
GPT3 few-shot performance

Figures (2)

Figure 1: Overview of different prompt tuning methods: (a.)Simple Prompt Tuning: This method adjusts the prompt embeddings, ${\bm{P}}$, which are then concatenated with the input embeddings. (b.)SuperPos-Prompt Tuning: Employs a mixture of embeddings as a weighted sum, ${\bm{e}}_j ; 1\leq j \leq m$, based on their weight in ${\bm{p}}'_i$. All ${\bm{e}}_j$s and vector ${\bm{p}}'_i$ are co-tuned. (c.)Residual Prompt Tuning: Utilizes an autoencoder with residual connection reparametrization. (d.)SuperPos-Prompt can also be interpreted as a linear up-projection initialized with sampled embeddings. (e.)Multi-task Subspace Finding: An auto-encoder is trained over pre-trained prompts (f.)Intrinsic Subspace Tuning: Employs the pre-trained decoder from 'Multi-task Subspace Finding' to map lower-dimension prompts to the model's dimension.
Figure 2: This figure illustrates results from our experiment using 'T5v1.1 Base LM-Adapted' as the foundation. (a) Learning curves comparing dropout effects on SuperPos-Prompt for selected tasks. (b) Learning curves comparing various prompt tuning methods across selected tasks, conducted without dropout. (c) Ablation study on the effect of sampled token count ($m$) for SuperPos-Prompt, with the x-axis representing sample token count and the y-axis indicating peak performance for the relevant metric. (d) Analysis of cosine similarity in superposition weights for each prompt token across all tasks.

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

TL;DR

Abstract

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (2)