Table of Contents
Fetching ...

Controllable Text Generation in the Instruction-Tuning Era

Dhananjay Ashok, Barnabas Poczos

TL;DR

This paper investigates controllable text generation in the era of instruction-tuned LLMs, revealing that prompting-based baselines often outperform traditional controllable methods on stylistic and structural constraints while approaching human performance on many stylistic tasks. It introduces ConGenBench, a diverse benchmark with 17 task datasets and 18 constraint datasets, and proposes a method to automatically generate constraint datasets using in-context learning, removing dependence on curated constraint resources. The study evaluates 9 baselines and methods, demonstrating that prompting strategies are a strong baseline for instruction-tuned models and highlighting the need for research into more challenging constraints and structural control. Collectively, the work provides both practical benchmarks and methodological tools to expand controllable generation research in the instruction-tuning regime, with implications for safer and more aligned AI systems.

Abstract

While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.

Controllable Text Generation in the Instruction-Tuning Era

TL;DR

This paper investigates controllable text generation in the era of instruction-tuned LLMs, revealing that prompting-based baselines often outperform traditional controllable methods on stylistic and structural constraints while approaching human performance on many stylistic tasks. It introduces ConGenBench, a diverse benchmark with 17 task datasets and 18 constraint datasets, and proposes a method to automatically generate constraint datasets using in-context learning, removing dependence on curated constraint resources. The study evaluates 9 baselines and methods, demonstrating that prompting strategies are a strong baseline for instruction-tuned models and highlighting the need for research into more challenging constraints and structural control. Collectively, the work provides both practical benchmarks and methodological tools to expand controllable generation research in the instruction-tuning regime, with implications for safer and more aligned AI systems.

Abstract

While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.
Paper Structure (30 sections, 12 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Performance of each type of method when the task is to make the output follow stylistic constraints. Each point is a specific method/baseline run on a specific task dataset, the boxes show the mean and range of the performance of the methods under a common type. While controllable text generation methods (controllable) outperform simple baselines (baseline), they lag behind simple prompting-based approaches (prompting). Prompting based approaches are competitive with human performance on most stylistic tasks.
  • Figure 2: Example datapoint (prompt, output, constraint score) using synthetic data generation algorithm
  • Figure 3: ConGenBench: an aggregation of 17 different task datasets, supplemented with 18 different constraint datasets. Each colour grouping represents a different task
  • Figure 4: Randomly selected example with low sentiment score (as determined by human annotation.)
  • Figure 5: Task Window for AMT Workers on Ironic Story Writing Task
  • ...and 7 more figures