Table of Contents
Fetching ...

ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization

Zijun Wu, Yongchang Hao, Lili Mou

TL;DR

ULPT tackles the cost of fine-tuning large language models by decoupling prompt dimensionality from model dimensionality and learning prompts in an ultra-low-dimensional space. It decomposes the prompt embedding as $\mathbf{E} = \mathbf{Z} \tilde{\mathbf{P}}$, where $\mathbf{Z} \in \mathbb{R}^{n\times r}$ is trained, $\tilde{\mathbf{P}} \in \mathbb{R}^{r\times d}$ is a fixed random projection, and introduces learnable shift $\mathbf{s}$ and scale $\mathbf{b}$ to align projections, yielding trainable parameters $nr + 2d$. Theoretical results show that random projections can preserve high-rank structure (Johnson-Lindenstrauss-type guarantees) and that gradient descent converges under standard PL/Lipschitz assumptions when the projection is fixed and the shift is nonzero. Empirically, ULPT achieves comparable or superior performance to vanilla prompt tuning across 21 NLP tasks with as little as 2% of the trainable parameters, and scales effectively to decoder models like Bloomz, highlighting a practical path for scalable LLM customization and continual learning.

Abstract

Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size. Parameter-efficient fine-tuning methods, such as prompt tuning, address this by reducing trainable parameters while maintaining strong performance. However, prior methods tie prompt embeddings to the model's dimensionality, which may not scale well with larger LLMs and more customized LLMs. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimizes prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection. To enhance alignment, we introduce learnable shift and scale embeddings. ULPT drastically reduces the trainable parameters, e.g., 2D only using 2% parameters compared with vanilla prompt tuning while retaining most of the performance across 21 NLP tasks. Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate ULPT's competitive performance over existing parameter-efficient methods.

ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization

TL;DR

ULPT tackles the cost of fine-tuning large language models by decoupling prompt dimensionality from model dimensionality and learning prompts in an ultra-low-dimensional space. It decomposes the prompt embedding as , where is trained, is a fixed random projection, and introduces learnable shift and scale to align projections, yielding trainable parameters . Theoretical results show that random projections can preserve high-rank structure (Johnson-Lindenstrauss-type guarantees) and that gradient descent converges under standard PL/Lipschitz assumptions when the projection is fixed and the shift is nonzero. Empirically, ULPT achieves comparable or superior performance to vanilla prompt tuning across 21 NLP tasks with as little as 2% of the trainable parameters, and scales effectively to decoder models like Bloomz, highlighting a practical path for scalable LLM customization and continual learning.

Abstract

Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size. Parameter-efficient fine-tuning methods, such as prompt tuning, address this by reducing trainable parameters while maintaining strong performance. However, prior methods tie prompt embeddings to the model's dimensionality, which may not scale well with larger LLMs and more customized LLMs. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimizes prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection. To enhance alignment, we introduce learnable shift and scale embeddings. ULPT drastically reduces the trainable parameters, e.g., 2D only using 2% parameters compared with vanilla prompt tuning while retaining most of the performance across 21 NLP tasks. Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate ULPT's competitive performance over existing parameter-efficient methods.

Paper Structure

This paper contains 19 sections, 8 theorems, 21 equations, 6 figures, 5 tables.

Key Result

Lemma 1

Sample a random matrix ${\bm{A}} \in {\mathbb{R}}^{r \times m}$ such that each element follows the standard Gaussian distribution. Let $\epsilon \in (0, 1/2]$ and $r \in {\mathbb{N}}_+$. There exists a constant $c$ such that for any ${\bm{x}} \in {\mathbb{R}}^d$.

Figures (6)

  • Figure 1: Overview of our approach. (a) ULPT up-projects ultra-low-dimensional embeddings with a random but fixed matrix, followed by a learnable alignment mechanism shared across all up-projected embeddings. (b) ULPT can significantly reduce parameters usage for LLM customizations.
  • Figure 2: Distribution of prompt embedding values over 100 prompt tokens. We randomly selected $20$ dimensions from the original prompt embeddings, which have 768 dimensions as in the T5-base model. The mean, 25/75 percentiles, and min/max are shown for the embedding values learned in the CoLA and SST-2 tasks (details explained in §\ref{['sec: experimental settings']}).
  • Figure 3: Left: Training loss curves on SST2 comparing ULPT with and without learnable shift and scale embeddings across different rank configurations. Right: Evaluation accuracy curves on SST2. For clarity, we present the case $r=2$, where our ULPT is at a disadvantage. The trend for other configurations is similar.
  • Figure 4: Pairwise similarities of the learned shift (left) and scale (right) embeddings for various rank configurations on SST-2.
  • Figure 5: Results on MNLI and Natural Questions with the T5-base model. The number of prompt tokens for both ULPT and naïve prompt tuning varies from $10$ to $100$.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Lemma 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 3
  • proof
  • Lemma 7
  • proof
  • Lemma 8
  • ...and 5 more