Table of Contents
Fetching ...

Prompt-Based Length Controlled Generation with Multiple Control Types

Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

TL;DR

The paper addresses the problem of achieving accurate length-controlled generation in GPT-style LLMs across multiple control types. It introduces a prompt-based framework that combines a standard prompt extractor, a rule-based reward model, and PPO-based reinforcement learning, complemented by sample filtering at inference. Key contributions include (1) a standard prompt extraction mechanism capable of handling diverse user instructions, (2) a rule-based reward signal that supports multiple length-control types, and (3) an RL fine-tuning scheme plus sample filtering that substantially improves length-control accuracy while preserving text quality on CNNDM and NYT summarization tasks. The approach demonstrates strong generalization to unseen prompt templates and scalable applicability to different model sizes, with practical benefits for user-facing generation tasks that require controllable output length.

Abstract

Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. Different from them, we propose a prompt-based method to achieve length controlled generation under different control types with high accuracy. In particular, we adopt reinforcement learning (RL) and sample filtering with the reward signal given by rule-based reward models, which enhances the length control ability of models by rewarding outputs that follow certain control instructions. In addition, we introduce a standard prompt extractor to parse arbitrary users' input into standard control instructions. Experiments show that our method significantly improves the accuracy of prompt-based length control on popular summarization datasets like CNNDM and NYT under multiple control types. Moreover, both the standard prompt extractor and RL-tuned model show strong generalization to unseen control prompt templates.

Prompt-Based Length Controlled Generation with Multiple Control Types

TL;DR

The paper addresses the problem of achieving accurate length-controlled generation in GPT-style LLMs across multiple control types. It introduces a prompt-based framework that combines a standard prompt extractor, a rule-based reward model, and PPO-based reinforcement learning, complemented by sample filtering at inference. Key contributions include (1) a standard prompt extraction mechanism capable of handling diverse user instructions, (2) a rule-based reward signal that supports multiple length-control types, and (3) an RL fine-tuning scheme plus sample filtering that substantially improves length-control accuracy while preserving text quality on CNNDM and NYT summarization tasks. The approach demonstrates strong generalization to unseen prompt templates and scalable applicability to different model sizes, with practical benefits for user-facing generation tasks that require controllable output length.

Abstract

Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. Different from them, we propose a prompt-based method to achieve length controlled generation under different control types with high accuracy. In particular, we adopt reinforcement learning (RL) and sample filtering with the reward signal given by rule-based reward models, which enhances the length control ability of models by rewarding outputs that follow certain control instructions. In addition, we introduce a standard prompt extractor to parse arbitrary users' input into standard control instructions. Experiments show that our method significantly improves the accuracy of prompt-based length control on popular summarization datasets like CNNDM and NYT under multiple control types. Moreover, both the standard prompt extractor and RL-tuned model show strong generalization to unseen control prompt templates.
Paper Structure (37 sections, 3 equations, 8 figures, 15 tables)

This paper contains 37 sections, 3 equations, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Overview of the model architecture. In training stage, the scores given by the reward model are used for the reinforcement learning method. In inference stage, the scores are applied for ranking and selecting the output sequences generated by PLM/LLMs.
  • Figure 2: The demonstration of Standard Prompt Extractor (SPE). The generative type of models are trained to output the standard control prompts (SCPs) directly (left), while the discriminative type of models are trained to predict the type of each control instruction, as well as the requested number of lengths from user utterance, such as the minimum value and the maximum value (right).
  • Figure 3: Plots of control error functions, which is the negative of reward functions.
  • Figure 4: Learning Curves of Standard Prompt Extractors. (a) Validation losses of GPT extractor. (b) Validation losses of BERT extractor. (c) Matching accuracy of GPT extractor. (c) Matching accuracy of BERT extractor. We show the curves of validation cross entropy and matching rate for both cases.
  • Figure 5: The Diagram of Learning Curves with GPT-S for single-type control instruction (only for "equal to") without sample filtering.
  • ...and 3 more figures