Prompt-Based Length Controlled Generation with Multiple Control Types
Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
TL;DR
The paper addresses the problem of achieving accurate length-controlled generation in GPT-style LLMs across multiple control types. It introduces a prompt-based framework that combines a standard prompt extractor, a rule-based reward model, and PPO-based reinforcement learning, complemented by sample filtering at inference. Key contributions include (1) a standard prompt extraction mechanism capable of handling diverse user instructions, (2) a rule-based reward signal that supports multiple length-control types, and (3) an RL fine-tuning scheme plus sample filtering that substantially improves length-control accuracy while preserving text quality on CNNDM and NYT summarization tasks. The approach demonstrates strong generalization to unseen prompt templates and scalable applicability to different model sizes, with practical benefits for user-facing generation tasks that require controllable output length.
Abstract
Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. Different from them, we propose a prompt-based method to achieve length controlled generation under different control types with high accuracy. In particular, we adopt reinforcement learning (RL) and sample filtering with the reward signal given by rule-based reward models, which enhances the length control ability of models by rewarding outputs that follow certain control instructions. In addition, we introduce a standard prompt extractor to parse arbitrary users' input into standard control instructions. Experiments show that our method significantly improves the accuracy of prompt-based length control on popular summarization datasets like CNNDM and NYT under multiple control types. Moreover, both the standard prompt extractor and RL-tuned model show strong generalization to unseen control prompt templates.
