Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Zhen-Ru Zhang, Chuanqi Tan, Haiyang Xu, Chengyu Wang, Jun Huang, Songfang Huang
TL;DR
<3-5 sentence high-level summary> The paper addresses the high cost of full fine-tuning by revisiting prefix tuning and introducing Adaptive Prefix Tuning (APT), which uses token-level and layer-level gates to adapt prefixes per Transformer layer. The method explicitly accounts for layer-wise differences in feature representations, enabling more efficient and effective task adaptation. Empirical results on SuperGLUE and NER across multiple backbones show that APT consistently outperforms P-Tuning v2, including in few-shot settings, and the analyses reveal meaningful weight distributions aligned with task properties. The work demonstrates that adaptively gated prefixes can yield better parameter-efficient fine-tuning and suggests directions for generalizing adaptive strategies to other architectures.
Abstract
Fine-tuning large pre-trained language models on various downstream tasks with whole parameters is prohibitively expensive. Hence, Parameter-efficient fine-tuning has attracted attention that only optimizes a few task-specific parameters with the frozen pre-trained model. In this work, we focus on prefix tuning, which only optimizes continuous prefix vectors (i.e. pseudo tokens) inserted into Transformer layers. Based on the observation that the learned syntax and semantics representation varies a lot at different layers, we argue that the adaptive prefix will be further tailored to each layer than the fixed one, enabling the fine-tuning more effective and efficient. Thus, we propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the effectiveness of APT. In addition, taking the gate as a probing, we validate the efficiency and effectiveness of the variable prefix.
