ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization
Zijun Wu, Yongchang Hao, Lili Mou
TL;DR
ULPT tackles the cost of fine-tuning large language models by decoupling prompt dimensionality from model dimensionality and learning prompts in an ultra-low-dimensional space. It decomposes the prompt embedding as $\mathbf{E} = \mathbf{Z} \tilde{\mathbf{P}}$, where $\mathbf{Z} \in \mathbb{R}^{n\times r}$ is trained, $\tilde{\mathbf{P}} \in \mathbb{R}^{r\times d}$ is a fixed random projection, and introduces learnable shift $\mathbf{s}$ and scale $\mathbf{b}$ to align projections, yielding trainable parameters $nr + 2d$. Theoretical results show that random projections can preserve high-rank structure (Johnson-Lindenstrauss-type guarantees) and that gradient descent converges under standard PL/Lipschitz assumptions when the projection is fixed and the shift is nonzero. Empirically, ULPT achieves comparable or superior performance to vanilla prompt tuning across 21 NLP tasks with as little as 2% of the trainable parameters, and scales effectively to decoder models like Bloomz, highlighting a practical path for scalable LLM customization and continual learning.
Abstract
Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size. Parameter-efficient fine-tuning methods, such as prompt tuning, address this by reducing trainable parameters while maintaining strong performance. However, prior methods tie prompt embeddings to the model's dimensionality, which may not scale well with larger LLMs and more customized LLMs. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimizes prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection. To enhance alignment, we introduce learnable shift and scale embeddings. ULPT drastically reduces the trainable parameters, e.g., 2D only using 2% parameters compared with vanilla prompt tuning while retaining most of the performance across 21 NLP tasks. Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate ULPT's competitive performance over existing parameter-efficient methods.
