Table of Contents
Fetching ...

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

MohammadAli SadraeiJavaeri, Ehsaneddin Asgari, Alice Carolyn McHardy, Hamid Reza Rabiee

TL;DR

The paper tackles the inefficiency of soft prompt tuning in data-limited settings by introducing SuperPos-Prompt, a reparameterization that forms each prompt token as a weighted superposition of multiple pretrained token embeddings, enabling more stable and rapid learning without relying on pre-trained prompts. It additionally shows that removing dropout from the frozen network improves convergence across prompt-tuning methods. Empirically, SuperPos-Prompt yields average gains of $+6.4$ on T5-Small and $+5.0$ on T5-Base across 13 GLUE/SuperGLUE tasks, occasionally surpassing full fine-tuning, and exhibits a plateau around $m=128$ sampled embeddings. The work highlights a practical, data-efficient pathway for PEFT in NLP and suggests future exploration of pre-trained-source prompts and multi-modal extensions.

Abstract

Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this domain: (i) we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

TL;DR

The paper tackles the inefficiency of soft prompt tuning in data-limited settings by introducing SuperPos-Prompt, a reparameterization that forms each prompt token as a weighted superposition of multiple pretrained token embeddings, enabling more stable and rapid learning without relying on pre-trained prompts. It additionally shows that removing dropout from the frozen network improves convergence across prompt-tuning methods. Empirically, SuperPos-Prompt yields average gains of on T5-Small and on T5-Base across 13 GLUE/SuperGLUE tasks, occasionally surpassing full fine-tuning, and exhibits a plateau around sampled embeddings. The work highlights a practical, data-efficient pathway for PEFT in NLP and suggests future exploration of pre-trained-source prompts and multi-modal extensions.

Abstract

Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this domain: (i) we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of in T5-Small and in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.
Paper Structure (15 sections, 5 equations, 2 figures, 3 tables)

This paper contains 15 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of different prompt tuning methods: (a.)Simple Prompt Tuning: This method adjusts the prompt embeddings, ${\bm{P}}$, which are then concatenated with the input embeddings. (b.)SuperPos-Prompt Tuning: Employs a mixture of embeddings as a weighted sum, ${\bm{e}}_j ; 1\leq j \leq m$, based on their weight in ${\bm{p}}'_i$. All ${\bm{e}}_j$s and vector ${\bm{p}}'_i$ are co-tuned. (c.)Residual Prompt Tuning: Utilizes an autoencoder with residual connection reparametrization. (d.)SuperPos-Prompt can also be interpreted as a linear up-projection initialized with sampled embeddings. (e.)Multi-task Subspace Finding: An auto-encoder is trained over pre-trained prompts (f.)Intrinsic Subspace Tuning: Employs the pre-trained decoder from 'Multi-task Subspace Finding' to map lower-dimension prompts to the model's dimension.
  • Figure 2: This figure illustrates results from our experiment using 'T5v1.1 Base LM-Adapted' as the foundation. (a) Learning curves comparing dropout effects on SuperPos-Prompt for selected tasks. (b) Learning curves comparing various prompt tuning methods across selected tasks, conducted without dropout. (c) Ablation study on the effect of sampled token count ($m$) for SuperPos-Prompt, with the x-axis representing sample token count and the y-axis indicating peak performance for the relevant metric. (d) Analysis of cosine similarity in superposition weights for each prompt token across all tasks.