Table of Contents
Fetching ...

`Keep it Together': Enforcing Cohesion in Extractive Summaries by Simulating Human Memory

Ronald Cardenas, Matthias Galle, Shay B. Cohen

TL;DR

This work tackles the tension between informativeness, redundancy, and cohesion in extractive summarization by introducing a two-stage control framework. It first reduces input redundancy during block-level processing and then balances informativeness and cohesion at sentence selection using a memory-inspired KvD-Select that simulates human lexical chain maintenance. The approach yields summaries with stronger lexical cohesion and smoother topic transitions while maintaining or improving informativeness across diverse domains, as demonstrated by both automatic metrics and human evaluations. The method provides a practical, parameterizable way to generate cohesive multi-sentence extracts, with potential benefits for technical domains where readability and traceability of content are critical.

Abstract

Extractive summaries are usually presented as lists of sentences with no expected cohesion between them. In this paper, we aim to enforce cohesion whilst controlling for informativeness and redundancy in summaries, in cases where the input exhibits high redundancy. The pipeline controls for redundancy in long inputs as it is consumed, and balances informativeness and cohesion during sentence selection. Our sentence selector simulates human memory to keep track of topics --modeled as lexical chains--, enforcing cohesive ties between noun phrases. Across a variety of domains, our experiments revealed that it is possible to extract highly cohesive summaries that nevertheless read as informative to humans as summaries extracted by only accounting for informativeness or redundancy. The extracted summaries exhibit smooth topic transitions between sentences as signaled by lexical chains, with chains spanning adjacent or near-adjacent sentences.

`Keep it Together': Enforcing Cohesion in Extractive Summaries by Simulating Human Memory

TL;DR

This work tackles the tension between informativeness, redundancy, and cohesion in extractive summarization by introducing a two-stage control framework. It first reduces input redundancy during block-level processing and then balances informativeness and cohesion at sentence selection using a memory-inspired KvD-Select that simulates human lexical chain maintenance. The approach yields summaries with stronger lexical cohesion and smoother topic transitions while maintaining or improving informativeness across diverse domains, as demonstrated by both automatic metrics and human evaluations. The method provides a practical, parameterizable way to generate cohesive multi-sentence extracts, with potential benefits for technical domains where readability and traceability of content are critical.

Abstract

Extractive summaries are usually presented as lists of sentences with no expected cohesion between them. In this paper, we aim to enforce cohesion whilst controlling for informativeness and redundancy in summaries, in cases where the input exhibits high redundancy. The pipeline controls for redundancy in long inputs as it is consumed, and balances informativeness and cohesion during sentence selection. Our sentence selector simulates human memory to keep track of topics --modeled as lexical chains--, enforcing cohesive ties between noun phrases. Across a variety of domains, our experiments revealed that it is possible to extract highly cohesive summaries that nevertheless read as informative to humans as summaries extracted by only accounting for informativeness or redundancy. The extracted summaries exhibit smooth topic transitions between sentences as signaled by lexical chains, with chains spanning adjacent or near-adjacent sentences.
Paper Structure (55 sections, 4 equations, 6 figures, 11 tables)

This paper contains 55 sections, 4 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Our extraction pipeline: local extraction step $m$ adds local sentences to $D'$; at sentence selection step $t$, KvD-Select balances informativeness of candidate $s_t$ with cohesion of summary $\hat{S}$.
  • Figure 2: Effect of block selection strategy over input redundancy (left), summary informativeness (center), and summary redundancy (right), evaluated as block selection proceeds on the MultiNews validation dataset.
  • Figure 3: Informativeness (right), redundancy (center), and cohesion (right) in summaries, across increasing values of trade-off parameter $\lambda_{\text{sel}}$, on the validation set of MultiNews.
  • Figure 4: Effect of block selection strategy over input redundancy (left), summary informativeness (center), and summary redundancy (right), evaluated as block selection proceeds on the validation splits of all datasets analysed.
  • Figure 5: Informativeness (left), redundancy (mid), and lexical cohesion (right) across different values of the trade-off parameter $\lambda_{sel}$ on the validation set of PubMed, BigPatent.C, GovReport, and MultiNews.
  • ...and 1 more figures