Table of Contents
Fetching ...

Length Controlled Generation for Black-box LLMs

Yuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu, Lei Huang, Tat-Seng Chua, Bing Qin

TL;DR

The paper tackles the problem of precise output length control in large language models without retraining. It reframes length-controlled generation as sampling from a constrained target distribution pi(y|x) ∝ f(y)P(y|x) and solves it with a Metropolis-Hastings framework augmented by an importance-sampling strategy, enabling efficient, parameter-free length control on black-box LLMs. Since internal probabilities P(y|x) are inaccessible, the authors estimate them with an LLM-as-Judge phi(y|x) and use a pairwise score to compare successive samples, maintaining alignment with the desired length. Experiments on CNN/Daily Mail and interval-length benchmarks across multiple models show near-perfect to perfect length control with minimal quality loss and rapid convergence, highlighting the method’s practicality for real-world controlled-generation tasks.

Abstract

Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practical use. In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying parameters, thereby preserving the original capabilities of LLMs. Experimental results demonstrate that our framework achieves almost 100\% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization and length-constrained instruction following, with minimal additional computational overhead. This also highlights the significant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.

Length Controlled Generation for Black-box LLMs

TL;DR

The paper tackles the problem of precise output length control in large language models without retraining. It reframes length-controlled generation as sampling from a constrained target distribution pi(y|x) ∝ f(y)P(y|x) and solves it with a Metropolis-Hastings framework augmented by an importance-sampling strategy, enabling efficient, parameter-free length control on black-box LLMs. Since internal probabilities P(y|x) are inaccessible, the authors estimate them with an LLM-as-Judge phi(y|x) and use a pairwise score to compare successive samples, maintaining alignment with the desired length. Experiments on CNN/Daily Mail and interval-length benchmarks across multiple models show near-perfect to perfect length control with minimal quality loss and rapid convergence, highlighting the method’s practicality for real-world controlled-generation tasks.

Abstract

Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practical use. In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying parameters, thereby preserving the original capabilities of LLMs. Experimental results demonstrate that our framework achieves almost 100\% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization and length-constrained instruction following, with minimal additional computational overhead. This also highlights the significant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.

Paper Structure

This paper contains 28 sections, 13 equations, 1 figure, 13 tables, 1 algorithm.

Figures (1)

  • Figure 1: The overall sampling process of our Metropolis-Hastings framework. The iteration starts by sampling an initial state from the distribution of Llm$y_0\sim P(y|x)$, and ends at $y_7$, which maximizes the target combination of length constraints and probability densities $\pi(y|x)\propto f(y)P(y|x)$. During each iteration, a new candidate content $y_i$ is generated based on the previous one $y_{i-1}$ via the proposal distribution $p(y_i|y_{i-1},x)$. The generated candidate $y_i$ will be either accepted or rejected considering the degree to which the target objectives are satisfied. We enhance the original proposal distribution by incorporating length constraints, yielding the importance distribution $q(y_i|y_{i-1},x)$, which increases the acceptance rate of candidates and significantly improves the iteration efficiency.