Table of Contents
Fetching ...

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang, Guotong Xie

TL;DR

This paper tackles the inefficiency of soft prompt tuning for large language models by proposing Instruction-Aware Prompt Tuning (IAPT), which generates instruction-conditioned soft prompts using four tokens per Transformer layer. IAPT introduces a bottleneck prompt generator with a self-attention pooler and learnable activation functions, enhanced by cross-layer parameter sharing to reduce overhead. The approach yields superior or competitive performance against strong PEFT baselines across diverse NLP tasks, while delivering lower latency in multi-tenant inference scenarios. The work advances practical, scalable fine-tuning of LLMs by tightly integrating instruction signals into per-layer prompts without extensive parameter growth.

Abstract

Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function. Pilot experiments show that prompt generators at different Transformer layers require different activation functions. Thus, we propose to learn the idiosyncratic activation functions for prompt generators automatically with the help of rational functions. We have conducted experiments on various tasks, and the experimental results demonstrate that (a) our IAPT method can outperform the recent baselines with comparable tunable parameters. (b) Our IAPT method is more efficient than LoRA under the single-backbone multi-tenant setting.

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

TL;DR

This paper tackles the inefficiency of soft prompt tuning for large language models by proposing Instruction-Aware Prompt Tuning (IAPT), which generates instruction-conditioned soft prompts using four tokens per Transformer layer. IAPT introduces a bottleneck prompt generator with a self-attention pooler and learnable activation functions, enhanced by cross-layer parameter sharing to reduce overhead. The approach yields superior or competitive performance against strong PEFT baselines across diverse NLP tasks, while delivering lower latency in multi-tenant inference scenarios. The work advances practical, scalable fine-tuning of LLMs by tightly integrating instruction signals into per-layer prompts without extensive parameter growth.

Abstract

Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear projections, and an activation function. Pilot experiments show that prompt generators at different Transformer layers require different activation functions. Thus, we propose to learn the idiosyncratic activation functions for prompt generators automatically with the help of rational functions. We have conducted experiments on various tasks, and the experimental results demonstrate that (a) our IAPT method can outperform the recent baselines with comparable tunable parameters. (b) Our IAPT method is more efficient than LoRA under the single-backbone multi-tenant setting.
Paper Structure (39 sections, 4 equations, 4 figures, 9 tables)

This paper contains 39 sections, 4 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Schematic illustration of our IAPT method. Left: The prompt generator which consists of a down-projection, a self-attention based pooler (SA pooler), a learnable activation whose curvature is learned in the downstream task, and a up-projection. Right: The prompt generator uses the instructions' hidden states as the input tensors, and output the generated soft tokens which will be concatenated to the next layer's hidden states.
  • Figure 2: Performances under different tunable parameter budgets. The $x$-axis represents the number of tunable parameters, and the $y$-axis represents the performance score.
  • Figure 3: Performances under different soft prompt lengths.
  • Figure 4: The learned activation functions for the prompt generators at different Transformer layers.