Table of Contents
Fetching ...

Hansel: Output Length Controlling Framework for Large Language Models

Seoha Song, Junhyun Lee, Hyeonmok Ko

TL;DR

This work tackles the challenge of controlling output length in large language models (LLMs) during generation. It introduces Hansel, a finetuning-based framework that injects periodically inserted hidden tokens signaling the remaining output length, enabling the model to learn general length-control behavior regardless of positional encodings. The approach achieves substantial improvements in mean absolute error ($MAE$) over prompt-based methods like Gretel across four datasets and multiple models, including robust extrapolation to unseen target lengths. Hansel also demonstrates resilience across different positional-encoding schemes, reduces abrupt termination, and mitigates infinite-generation issues, while preserving language coherence and fluency. The practical impact is a versatile, architecture-agnostic method for reliable length-controlled generation in summarization and dialogue tasks, with potential for multi-unit length control and efficient transfer to new domains.

Abstract

Despite the great success of large language models (LLMs), efficiently controlling the length of the output sequence still remains a challenge. In this paper, we propose Hansel, an efficient framework for length control in LLMs without affecting its generation ability. Hansel utilizes periodically outputted hidden special tokens to keep track of the remaining target length of the output sequence. Together with techniques to avoid abrupt termination of the output, this seemingly simple method proved to be efficient and versatile, while not harming the coherency and fluency of the generated text. The framework can be applied to any pre-trained LLMs during the finetuning stage of the model, regardless of its original positional encoding method. We demonstrate this by finetuning four different LLMs with Hansel and show that the mean absolute error of the output sequence decreases significantly in every model and dataset compared to the prompt-based length control finetuning. Moreover, the framework showed a substantially improved ability to extrapolate to target lengths unseen during finetuning, such as long dialog responses or extremely short summaries. This indicates that the model learns the general means of length control, rather than learning to match output lengths to those seen during training.

Hansel: Output Length Controlling Framework for Large Language Models

TL;DR

This work tackles the challenge of controlling output length in large language models (LLMs) during generation. It introduces Hansel, a finetuning-based framework that injects periodically inserted hidden tokens signaling the remaining output length, enabling the model to learn general length-control behavior regardless of positional encodings. The approach achieves substantial improvements in mean absolute error () over prompt-based methods like Gretel across four datasets and multiple models, including robust extrapolation to unseen target lengths. Hansel also demonstrates resilience across different positional-encoding schemes, reduces abrupt termination, and mitigates infinite-generation issues, while preserving language coherence and fluency. The practical impact is a versatile, architecture-agnostic method for reliable length-controlled generation in summarization and dialogue tasks, with potential for multi-unit length control and efficient transfer to new domains.

Abstract

Despite the great success of large language models (LLMs), efficiently controlling the length of the output sequence still remains a challenge. In this paper, we propose Hansel, an efficient framework for length control in LLMs without affecting its generation ability. Hansel utilizes periodically outputted hidden special tokens to keep track of the remaining target length of the output sequence. Together with techniques to avoid abrupt termination of the output, this seemingly simple method proved to be efficient and versatile, while not harming the coherency and fluency of the generated text. The framework can be applied to any pre-trained LLMs during the finetuning stage of the model, regardless of its original positional encoding method. We demonstrate this by finetuning four different LLMs with Hansel and show that the mean absolute error of the output sequence decreases significantly in every model and dataset compared to the prompt-based length control finetuning. Moreover, the framework showed a substantially improved ability to extrapolate to target lengths unseen during finetuning, such as long dialog responses or extremely short summaries. This indicates that the model learns the general means of length control, rather than learning to match output lengths to those seen during training.

Paper Structure

This paper contains 38 sections, 7 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: An example conversation with GPT. We have edited the example sentence for brevity.
  • Figure 2: Schematic of the Hansel framework, compared with the vanilla and Gretel scheme. Vanilla is normal fine-tuning and Gretel is the prompt-based length-aware fine-tuning. Hansel receives the target length as a special token and regularly places additional special tokens (marked as @ in the figure) that inform the position while fine-tuning.
  • Figure 3: The extrapolation of the length control methods with different target lengths. The dashed line (shaded region) indicates the mean length ($\pm$ standard deviation) of the dataset. While the MAE of other methods increases drastically when the target length is different from that of the dataset, our method (Hansel) shows robust performance.
  • Figure 4: The extrapolation of the length control methods with different target lengths. The dashed line (shaded region) indicates the mean length ($\pm$ standard deviation) of the dataset. While the MAE increases drastically when the target length is different from that of the dataset, our method (Hansel) shows robust performance.
  • Figure 5: The MAE and ROUGE-L for the Phi-2 Hansel model as we increase the number of epochs. The performances are plotted as the ratio to the second epoch performance.
  • ...and 1 more figures