Table of Contents
Fetching ...

ACCEPT: Adaptive Codebook for Composite and Efficient Prompt Tuning

Yu-Chen Lin, Wei-Hua Li, Jun-Cheng Chen, Chu-Song Chen

TL;DR

This work proposes Adaptive Codebook for Composite and Efficient Prompt Tuning, allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights.

Abstract

Prompt Tuning has been a popular Parameter-Efficient Fine-Tuning method attributed to its remarkable performance with few updated parameters on various large-scale pretrained Language Models (PLMs). Traditionally, each prompt has been considered indivisible and updated independently, leading the parameters increase proportionally as prompt length grows. To address this issue, we propose Adaptive Codebook for Composite and Efficient Prompt Tuning (ACCEPT). In our method, we refer to the concept of product quantization (PQ), allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights. We achieve the superior performance on 17 diverse natural language tasks including natural language understanding (NLU) and question answering (QA) tasks by tuning only 0.3% of parameters of the PLMs. Our approach also excels in few-shot and large model settings, highlighting its significant potential.

ACCEPT: Adaptive Codebook for Composite and Efficient Prompt Tuning

TL;DR

This work proposes Adaptive Codebook for Composite and Efficient Prompt Tuning, allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights.

Abstract

Prompt Tuning has been a popular Parameter-Efficient Fine-Tuning method attributed to its remarkable performance with few updated parameters on various large-scale pretrained Language Models (PLMs). Traditionally, each prompt has been considered indivisible and updated independently, leading the parameters increase proportionally as prompt length grows. To address this issue, we propose Adaptive Codebook for Composite and Efficient Prompt Tuning (ACCEPT). In our method, we refer to the concept of product quantization (PQ), allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights. We achieve the superior performance on 17 diverse natural language tasks including natural language understanding (NLU) and question answering (QA) tasks by tuning only 0.3% of parameters of the PLMs. Our approach also excels in few-shot and large model settings, highlighting its significant potential.

Paper Structure

This paper contains 20 sections, 7 equations, 5 figures, 16 tables.

Figures (5)

  • Figure 1: The overall model architecture of ACCEPT. We subdivide both (a) Soft-weighted Codebook Prepended Prompt (SCPP) and (b) Soft-weighted Codebook Added Prompt (SCAP) to $K$ subspaces. Each subspace has a codebook with $r$ codewords shared by all prompts. Each sub-prompt is linearly combined by the codewords and weights. (c ) In the main architecture of ACCEPT, the final input is formed by prepending SCPP to the word embedding updated with SCAP. The pretrained model, with its parameters fixed, learns to output correct labels through tunable SCPP and SCAP.
  • Figure 2: Average performance on the GLUE and SuperGLUE benchmarks relative to the number of trainable parameters for the T5-base model. ACCEPT achieves the best performance with the fewest parameters.
  • Figure 3: Performance on BoolQ, MultiRC and Wic datasets with different model sizes (T5-small, T5-base and T5-large). Our method shows improved performance as the model size increases and reaches SOTA on larger model, showcasing the potentional of ACCEPT.
  • Figure 4: Performance on the MRPC and STS-B datasets and their relative training time (normalized to the one with $m = 100$) for various prompt lengths $m = \{20, 40, 60, 80, 100\}$. Both datasets show the best performance at $m = 60$.
  • Figure 5: Training curve (left) and validation accuracy curve (right) comparison between different prompt initialization strategies across QQP, QNLI and SST-2.