Table of Contents
Fetching ...

Precise Length Control in Large Language Models

Bradley Butcher, Michael O'Keefe, James Titchener

TL;DR

The paper addresses the problem of controlling output length in decoder-only large language models by introducing a countdown mechanism via reverse length-difference positional encodings. It adapts LDPE (and the offset variant ORPE) for decoder-only architectures and trains models with a scaled embedding integration to learn to terminate outputs at a user-specified length, additionally proposing Max New Tokens++ to manage upper bounds. Empirical results on QA and document summarisation show precise, token-level length control with minimal degradation in content quality, and the Max New Tokens++ approach demonstrates effective upper-bound termination behavior. The work enhances the practicality of LLMs in production settings that require strict length constraints and opens avenues for counting strategies beyond tokens and broader model generalization.

Abstract

Large Language Models (LLMs) are increasingly used in production systems, powering applications such as chatbots, summarization, and question answering. Despite their success, controlling the length of their response remains a significant challenge, particularly for tasks requiring structured outputs or specific levels of detail. In this work, we propose a method to adapt pre-trained decoder-only LLMs for precise control of response length. Our approach incorporates a secondary length-difference positional encoding (LDPE) into the input embeddings, which counts down to a user-set response termination length. Fine-tuning with LDPE allows the model to learn to terminate responses coherently at the desired length, achieving mean token errors of less than 3 tokens. We also introduce Max New Tokens++, an extension that enables flexible upper-bound length control, rather than an exact target. Experimental results on tasks such as question answering and document summarization demonstrate that our method enables precise length control without compromising response quality.

Precise Length Control in Large Language Models

TL;DR

The paper addresses the problem of controlling output length in decoder-only large language models by introducing a countdown mechanism via reverse length-difference positional encodings. It adapts LDPE (and the offset variant ORPE) for decoder-only architectures and trains models with a scaled embedding integration to learn to terminate outputs at a user-specified length, additionally proposing Max New Tokens++ to manage upper bounds. Empirical results on QA and document summarisation show precise, token-level length control with minimal degradation in content quality, and the Max New Tokens++ approach demonstrates effective upper-bound termination behavior. The work enhances the practicality of LLMs in production settings that require strict length constraints and opens avenues for counting strategies beyond tokens and broader model generalization.

Abstract

Large Language Models (LLMs) are increasingly used in production systems, powering applications such as chatbots, summarization, and question answering. Despite their success, controlling the length of their response remains a significant challenge, particularly for tasks requiring structured outputs or specific levels of detail. In this work, we propose a method to adapt pre-trained decoder-only LLMs for precise control of response length. Our approach incorporates a secondary length-difference positional encoding (LDPE) into the input embeddings, which counts down to a user-set response termination length. Fine-tuning with LDPE allows the model to learn to terminate responses coherently at the desired length, achieving mean token errors of less than 3 tokens. We also introduce Max New Tokens++, an extension that enables flexible upper-bound length control, rather than an exact target. Experimental results on tasks such as question answering and document summarization demonstrate that our method enables precise length control without compromising response quality.

Paper Structure

This paper contains 25 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A simplified diagram of the reverse positional encoding methods applied a target response length of 100 tokens and a question length of 5 tokens ($L=105$ and $n=5$). For LDPE a positional encoding of $PE{(i_{\text{LDPE}},k)}$ is added to each token embedding. For ORPE no encoding is added to the prompt part of the text, then a encoding countdown from $i_{\text{ORPE}} = L-n$ to $i_{\text{ORPE}} = 1$ is added to the response.
  • Figure 2: Comparison of target and response lengths for different length control approaches on a question-answering task. The ideal response length is indicated by the dashed line in each panel. Top left: results for prompting Mistral without length control. Top right: results for fine-tuned prompting Mistral. Bottom left: results for LDPE fine tuned Mistral 7B model. Bottom right: results for LDPE fine tuned Llama3 8B model.
  • Figure 3: Results for length the length controlled summarisation task. Models used are LDPE fine-tuned Mistral and Llama (-LDPE), as well as the Mistral baseline model fine-tuned for prompted based length control (Mistral-Prompted). Left: BERT scores between summaries and GPT-3.5 ground truth summaries. Right: Mean absolute error (MAE) between the number of tokens in model's summary and the target number of tokens.
  • Figure 4: Performance of length controlled and baseline models against a range of standard benchmarks. All evaluations used zero-shot prompting. The labels Mistral and Llama correspond to Mistral-7B-Instruct and Llama-3-8B-Instruct respectively. The tag -LDPE means the model was fine-tuned with the LDPE. Additionally LDPE were added during the evaluation.
  • Figure 5: Plot of token limit vs response length for using a Max New Tokens++ fine-tuned Mistral model on a question-answering task. This technique has not been applied to Llama but we expect the performance to be fairly similar based on the similarity of the earlier results for exact length control.
  • ...and 1 more figures