SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings
MohammadAli SadraeiJavaeri, Ehsaneddin Asgari, Alice Carolyn McHardy, Hamid Reza Rabiee
TL;DR
The paper tackles the inefficiency of soft prompt tuning in data-limited settings by introducing SuperPos-Prompt, a reparameterization that forms each prompt token as a weighted superposition of multiple pretrained token embeddings, enabling more stable and rapid learning without relying on pre-trained prompts. It additionally shows that removing dropout from the frozen network improves convergence across prompt-tuning methods. Empirically, SuperPos-Prompt yields average gains of $+6.4$ on T5-Small and $+5.0$ on T5-Base across 13 GLUE/SuperGLUE tasks, occasionally surpassing full fine-tuning, and exhibits a plateau around $m=128$ sampled embeddings. The work highlights a practical, data-efficient pathway for PEFT in NLP and suggests future exploration of pre-trained-source prompts and multi-modal extensions.
Abstract
Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this domain: (i) we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.
