Table of Contents
Fetching ...

SumHiS: Extractive Summarization Exploiting Hidden Structure

Tikhonov Pavel, Anastasiya Ianina, Valentin Malykh

TL;DR

SumHiS addresses extractive summarization by jointly ranking sentences and discovering hidden document structure to emphasize main topics. Built on BERT-based representations, it achieves state-of-the-art ROUGE-2 and strong ROUGE-L on CNN/Daily Mail, and surpasses abstractive baselines in ROUGE-2 by a substantial margin. The structure-discovery component, inspired by ABAE, filters ranked sentences by topical units, with ablations confirming its contribution. The results demonstrate the value of incorporating hidden topical structure into extractive summarization and point toward end-to-end training and integration with abstractive approaches as promising directions.

Abstract

Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2 metric exceeding the previous approaches by 10%. Additionally, we show that hidden structure of the text could be interpreted as aspects.

SumHiS: Extractive Summarization Exploiting Hidden Structure

TL;DR

SumHiS addresses extractive summarization by jointly ranking sentences and discovering hidden document structure to emphasize main topics. Built on BERT-based representations, it achieves state-of-the-art ROUGE-2 and strong ROUGE-L on CNN/Daily Mail, and surpasses abstractive baselines in ROUGE-2 by a substantial margin. The structure-discovery component, inspired by ABAE, filters ranked sentences by topical units, with ablations confirming its contribution. The results demonstrate the value of incorporating hidden topical structure into extractive summarization and point toward end-to-end training and integration with abstractive approaches as promising directions.

Abstract

Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2 metric exceeding the previous approaches by 10%. Additionally, we show that hidden structure of the text could be interpreted as aspects.
Paper Structure (18 sections, 8 equations, 6 figures, 5 tables)

This paper contains 18 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: SumHiS: ranking model (right) + hidden structure discovery model (left).
  • Figure 2: Ranking model
  • Figure 3: Sample of SumHiS generated summary.
  • Figure 4: True Positive Rate vs. False Positive Rate for SumHiS with different threshold values.
  • Figure 5: Histogram of distances between initial text and positive (blue) / negative (orange) sentences
  • ...and 1 more figures