SumHiS: Extractive Summarization Exploiting Hidden Structure
Tikhonov Pavel, Anastasiya Ianina, Valentin Malykh
TL;DR
SumHiS addresses extractive summarization by jointly ranking sentences and discovering hidden document structure to emphasize main topics. Built on BERT-based representations, it achieves state-of-the-art ROUGE-2 and strong ROUGE-L on CNN/Daily Mail, and surpasses abstractive baselines in ROUGE-2 by a substantial margin. The structure-discovery component, inspired by ABAE, filters ranked sentences by topical units, with ablations confirming its contribution. The results demonstrate the value of incorporating hidden topical structure into extractive summarization and point toward end-to-end training and integration with abstractive approaches as promising directions.
Abstract
Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2 metric exceeding the previous approaches by 10%. Additionally, we show that hidden structure of the text could be interpreted as aspects.
