Table of Contents
Fetching ...

Exploring Iterative Controllable Summarization with Large Language Models

Sangwon Ryu, Heejin Do, Daehee Kim, Hwanjo Yu, Dongwoo Kim, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

TL;DR

This work identifies a gap in precisely controllable summarization by LLMs and introduces a refined attribute measurement suite for extractiveness, length, topic, and speaker, along with iterative evaluation metrics. It proposes Guide-to-Explain (GTE), a two-phase framework that uses step-by-step attribute identification and self-explanation guidance to steer LLMs toward attribute-aligned summaries in few iterations. Empirical results on MACSum datasets show that GTE dramatically reduces failures and iterations while preserving or improving overall summary quality, even for numerically constrained attributes. The findings highlight both the feasibility and limits of iterative controllable summarization with LLMs and point to future work on balancing multiple correlated attributes and exploring more robust planning strategies.

Abstract

Large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, their ability to precisely control summary attributes (e.g., length or topic) remains underexplored, limiting their adaptability to specific user preferences. In this paper, we systematically explore the controllability of LLMs. To this end, we revisit summary attribute measurements and introduce iterative evaluation metrics, failure rate and average iteration count to precisely evaluate controllability of LLMs, rather than merely assessing errors. Our findings show that LLMs struggle more with numerical attributes than with linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. Our GTE framework enables the model to identify misaligned attributes in the initial draft and guides it in self-explaining errors in the previous output. By allowing the model to reflect on its misalignment, GTE generates well-adjusted summaries that satisfy the desired attributes with robust effectiveness, requiring surprisingly fewer iterations than other iterative approaches.

Exploring Iterative Controllable Summarization with Large Language Models

TL;DR

This work identifies a gap in precisely controllable summarization by LLMs and introduces a refined attribute measurement suite for extractiveness, length, topic, and speaker, along with iterative evaluation metrics. It proposes Guide-to-Explain (GTE), a two-phase framework that uses step-by-step attribute identification and self-explanation guidance to steer LLMs toward attribute-aligned summaries in few iterations. Empirical results on MACSum datasets show that GTE dramatically reduces failures and iterations while preserving or improving overall summary quality, even for numerically constrained attributes. The findings highlight both the feasibility and limits of iterative controllable summarization with LLMs and point to future work on balancing multiple correlated attributes and exploring more robust planning strategies.

Abstract

Large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, their ability to precisely control summary attributes (e.g., length or topic) remains underexplored, limiting their adaptability to specific user preferences. In this paper, we systematically explore the controllability of LLMs. To this end, we revisit summary attribute measurements and introduce iterative evaluation metrics, failure rate and average iteration count to precisely evaluate controllability of LLMs, rather than merely assessing errors. Our findings show that LLMs struggle more with numerical attributes than with linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. Our GTE framework enables the model to identify misaligned attributes in the initial draft and guides it in self-explaining errors in the previous output. By allowing the model to reflect on its misalignment, GTE generates well-adjusted summaries that satisfy the desired attributes with robust effectiveness, requiring surprisingly fewer iterations than other iterative approaches.

Paper Structure

This paper contains 31 sections, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Ambiguous instructions hinder LLMs' ability to follow control signals and complicate the evaluation process (e.g., how should “ highly” be judged in a generated summary?).
  • Figure 2: LLMs show notable errors in word count estimation: for an article with 484 words and a summary with 157 words, the model predicts 668 and 159 words, respectively—revealing limitations in self-critique within controllable summarization.
  • Figure 3: Overview of guide-to-explain system (GTE). The pink parts ($\color{custompink}\blacksquare$) represent the step-by-step attribute-identification, and the blue parts ($\color{customblue}\blacksquare$) correspond to the self-explanation guidance.
  • Figure 4: The graphs show how the length ratio changes for each iteration. The intensity of the distribution color is proportional to the number of data points, and the markers represent the average values. The red line indicates the target length, with values of 7.5%, 20%, and 32.5% from left to right.
  • Figure 5: Correlations among attributes hinder LLMs’ ability to control them jointly in mixed-attribute setting.
  • ...and 7 more figures