Table of Contents
Fetching ...

Control Prefixes for Parameter-Efficient Text Generation

Jordan Clive, Kris Cao, Marek Rei

TL;DR

Control Prefixes address parameter-efficient text generation by combining a fixed large pretrained LM with input-conditioned control prefixes learned per attribute, enabling datapoint-level guidance without full fine-tuning. The method optimizes a compact set of prefixes, including shared re-parameterized components across attention classes, to steer generation via input guidance $G$. It achieves state-of-the-art or competitive results on data-to-text benchmarks (e.g., WebNLG, DART), simplifies effectively with SARI/FKGL gains, and attains strong ROUGE scores with superior human evaluations on XSum, all while adding less than 3% parameters. The approach also demonstrates zero-shot transfer capabilities when attribute labels are semantically similar, supported by interpretable prefix organization across layers and attention types, offering practical impact for deployment of scalable, controllable NLG systems.

Abstract

Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application. However, it uses the same dataset-level tuned prompt for all examples in the dataset. We extend this idea and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). Although the aim is to develop a parameter-efficient model, we show Control Prefixes can even outperform full fine-tuning methods. We present state-of-the-art results on several data-to-text datasets, including WebNLG.

Control Prefixes for Parameter-Efficient Text Generation

TL;DR

Control Prefixes address parameter-efficient text generation by combining a fixed large pretrained LM with input-conditioned control prefixes learned per attribute, enabling datapoint-level guidance without full fine-tuning. The method optimizes a compact set of prefixes, including shared re-parameterized components across attention classes, to steer generation via input guidance . It achieves state-of-the-art or competitive results on data-to-text benchmarks (e.g., WebNLG, DART), simplifies effectively with SARI/FKGL gains, and attains strong ROUGE scores with superior human evaluations on XSum, all while adding less than 3% parameters. The approach also demonstrates zero-shot transfer capabilities when attribute labels are semantically similar, supported by interpretable prefix organization across layers and attention types, offering practical impact for deployment of scalable, controllable NLG systems.

Abstract

Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application. However, it uses the same dataset-level tuned prompt for all examples in the dataset. We extend this idea and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). Although the aim is to develop a parameter-efficient model, we show Control Prefixes can even outperform full fine-tuning methods. We present state-of-the-art results on several data-to-text datasets, including WebNLG.

Paper Structure

This paper contains 27 sections, 3 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: High-level diagram contrasting prefix-tuning and Control Prefixes in the single-task setup for a PLM such as BART$_{\text{LARGE }}$. The same single-task batch (examples 1,2,3,4 and 5) is considered for both setups. Left: Prefix-tuning has one general prefix $P$ for all examples. Right: Control Prefixes utilizes additional attribute information at the input-level, $G$, in i). This conditional information is used in ii) to dictate which control prefix ($C_A$, $C_B$, $C_C$) to use for a particular example in a batch. This takes advantage of prefix-tuning's capacity to include different prefixes in one forward pass.
  • Figure 2: t-SNE visualizations for the decoder self-attention constituent of the simplification model's length compression control prefixes. Each circle represents a control prefix corresponding to each length ratio (bins of fixed width 0.05, from 0 to 1.1).
  • Figure 3: t-SNE visualizations for the encoder constituent of control prefixes representing WebNLG categories seen during training. Each circle represents a category seen during training for the Control Prefixes ($A1$) model. All 15 categories are seen categories in WebNLG+ 2020, along with the category Company. WebNLG+ 2020 has 3 additional unseen categories to those shown.
  • Figure 4: Histogram illustrating the influence of different target length ratios on the actual length compression ratio output distribution for the simplification Control Prefixes model on the TurkCorpus validation set.
  • Figure 5: t-SNE visualizations for constituents of the length compression control prefixes learnt as part of the simplification Control Prefixes model. Each diagram depicts representations of control prefixes corresponding to each length value (41 bins of fixed width 0.05, from 0 to 2) for a particular attention mechanism. The dimension represented on the x-axis is stretched from a 1:1 to 2:1 aspect ratio for labelling clarity.
  • ...and 1 more figures