Table of Contents
Fetching ...

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui

TL;DR

This work targets coherence in extractive summarization by introducing a human-annotated dataset that captures user-intent-aware coherence through natural language feedback. It frames extractive coherence as sentence selection guided by feedback and demonstrates improvements by fine-tuning open-source LLMs (decoder-only and encoder–decoder) with this feedback, achieving about a 10% Rouge-L gain and favorable human judgments. The study reveals model-type-dependent effects: decoder-only models benefit from feedback and full-parametric training during fine-tuning, while encoder–decoder models gain from pre-finetuning on feedback data. By providing data and code, the paper offers a practical resource to advance coherence-oriented extractive summarization and invites broader exploration across models and languages.

Abstract

Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries. We utilize this dataset for aligning LLMs through supervised fine-tuning with natural language human feedback to enhance the coherence of their generated summaries. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (~10% Rouge-L) in terms of producing coherent summaries. We further utilize human feedback to benchmark results over instruction-tuned models such as FLAN-T5 which resulted in several interesting findings. Data and source code are available at https://github.com/Mihir3009/Extract-AI.

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

TL;DR

This work targets coherence in extractive summarization by introducing a human-annotated dataset that captures user-intent-aware coherence through natural language feedback. It frames extractive coherence as sentence selection guided by feedback and demonstrates improvements by fine-tuning open-source LLMs (decoder-only and encoder–decoder) with this feedback, achieving about a 10% Rouge-L gain and favorable human judgments. The study reveals model-type-dependent effects: decoder-only models benefit from feedback and full-parametric training during fine-tuning, while encoder–decoder models gain from pre-finetuning on feedback data. By providing data and code, the paper offers a practical resource to advance coherence-oriented extractive summarization and invites broader exploration across models and languages.

Abstract

Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries. We utilize this dataset for aligning LLMs through supervised fine-tuning with natural language human feedback to enhance the coherence of their generated summaries. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (~10% Rouge-L) in terms of producing coherent summaries. We further utilize human feedback to benchmark results over instruction-tuned models such as FLAN-T5 which resulted in several interesting findings. Data and source code are available at https://github.com/Mihir3009/Extract-AI.
Paper Structure (36 sections, 4 figures, 2 tables)

This paper contains 36 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Schematic representation of our natural language feedback collection pipeline and aligning LLMs with provided human feedback.
  • Figure 2: Illustration of annotated instance
  • Figure 3: Performance of (a) Dec. only model, and (b) Enc. + Dec. Model on our proposed dataset.
  • Figure 4: Average number of preferences across three evaluators.