Table of Contents
Fetching ...

Enhancing Long Document Long Form Summarisation with Self-Planning

Xiaotang Du, Rohit Saxena, Laura Perez-Beltrachini, Pasquale Minervini, Ivan Titov

TL;DR

Large language models struggle with faithful and concise long-document summaries. The authors propose highlight-guided generation (HiGen) using sentence-level plans derived from the input to guide summarisation, with end-to-end and two-stage variants. Evaluations on GovReport and QMSum show that the two-stage HiGen improves both relevance and factual consistency, outperforming baselines and attribution-based planning. Generative highlights provide more coherent, information-rich plans than perturbation-based attribution, especially for dense documents. The approach offers a flexible, training-free content planning mechanism to enhance long-context summarisation.

Abstract

We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfulness of generated summaries. Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan. We explore both an end-to-end and two-stage variants of the approach, finding that the two-stage pipeline performs better on long and information-dense documents. Experiments on long-form summarisation datasets demonstrate that our method consistently improves factual consistency while preserving relevance and overall quality. On GovReport, our best approach has improved ROUGE-L by 4.1 points and achieves about 35% gains in SummaC scores. Qualitative analysis shows that highlight-guided summarisation helps preserve important details, leading to more accurate and insightful summaries across domains.

Enhancing Long Document Long Form Summarisation with Self-Planning

TL;DR

Large language models struggle with faithful and concise long-document summaries. The authors propose highlight-guided generation (HiGen) using sentence-level plans derived from the input to guide summarisation, with end-to-end and two-stage variants. Evaluations on GovReport and QMSum show that the two-stage HiGen improves both relevance and factual consistency, outperforming baselines and attribution-based planning. Generative highlights provide more coherent, information-rich plans than perturbation-based attribution, especially for dense documents. The approach offers a flexible, training-free content planning mechanism to enhance long-context summarisation.

Abstract

We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfulness of generated summaries. Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan. We explore both an end-to-end and two-stage variants of the approach, finding that the two-stage pipeline performs better on long and information-dense documents. Experiments on long-form summarisation datasets demonstrate that our method consistently improves factual consistency while preserving relevance and overall quality. On GovReport, our best approach has improved ROUGE-L by 4.1 points and achieves about 35% gains in SummaC scores. Qualitative analysis shows that highlight-guided summarisation helps preserve important details, leading to more accurate and insightful summaries across domains.

Paper Structure

This paper contains 21 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustration of highlight-guided generation framework for summarisation. The generated summary is grounded by the influential sentences extracted by the same model architecture.
  • Figure 2: Prompt used for end-to-end highlight extraction and summary generation on GovReport
  • Figure 3: Prompt used for generating the summary with the two-step pipeline on GovReport.
  • Figure 4: Prompt used for end-to-end highlight extraction and summary generation on QMSum
  • Figure 5: Prompt used for generating the summary with the two-step pipeline on QMSum.