Table of Contents
Fetching ...

Classify or Select: Neural Architectures for Extractive Document Summarization

Ramesh Nallapati, Bowen Zhou, Mingbo Ma

TL;DR

The paper develops two RNN-based architectures, Classifier and Selector, for extractive single-document summarization, both using a hierarchical encoding and a score-based mechanism over abstract features (salience, novelty, content, redundancy, position). It evaluates shallow and deep variants on Daily Mail and DUC 2002, showing deep Classifier models achieve strong performance and often surpass baselines, with Selector offering benefits in less-structured settings. A novel abstractive training approach links SummaRuNNer with an RNN decoder to train from abstractive references, while extensive qualitative analyses highlight interpretability via feature visualization and learned weights. The work also discusses domain transfer implications and suggests directions for applying the Selector to more unstructured tasks and for incorporating beam search in future work.

Abstract

We present two novel and contrasting Recurrent Neural Network (RNN) based architectures for extractive summarization of documents. The Classifier based architecture sequentially accepts or rejects each sentence in the original document order for its membership in the final summary. The Selector architecture, on the other hand, is free to pick one sentence at a time in any arbitrary order to piece together the summary. Our models under both architectures jointly capture the notions of salience and redundancy of sentences. In addition, these models have the advantage of being very interpretable, since they allow visualization of their predictions broken up by abstract features such as information content, salience and redundancy. We show that our models reach or outperform state-of-the-art supervised models on two different corpora. We also recommend the conditions under which one architecture is superior to the other based on experimental evidence.

Classify or Select: Neural Architectures for Extractive Document Summarization

TL;DR

The paper develops two RNN-based architectures, Classifier and Selector, for extractive single-document summarization, both using a hierarchical encoding and a score-based mechanism over abstract features (salience, novelty, content, redundancy, position). It evaluates shallow and deep variants on Daily Mail and DUC 2002, showing deep Classifier models achieve strong performance and often surpass baselines, with Selector offering benefits in less-structured settings. A novel abstractive training approach links SummaRuNNer with an RNN decoder to train from abstractive references, while extensive qualitative analyses highlight interpretability via feature visualization and learned weights. The work also discusses domain transfer implications and suggests directions for applying the Selector to more unstructured tasks and for incorporating beam search in future work.

Abstract

We present two novel and contrasting Recurrent Neural Network (RNN) based architectures for extractive summarization of documents. The Classifier based architecture sequentially accepts or rejects each sentence in the original document order for its membership in the final summary. The Selector architecture, on the other hand, is free to pick one sentence at a time in any arbitrary order to piece together the summary. Our models under both architectures jointly capture the notions of salience and redundancy of sentences. In addition, these models have the advantage of being very interpretable, since they allow visualization of their predictions broken up by abstract features such as information content, salience and redundancy. We show that our models reach or outperform state-of-the-art supervised models on two different corpora. We also recommend the conditions under which one architecture is superior to the other based on experimental evidence.

Paper Structure

This paper contains 13 sections, 8 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The shallow and deep versions of the Classifier architecture for extractive summarization.
  • Figure 2: Selector architecture for extractive summarization. The shallow and deep versions are identical except for the fact that the simple vector representation for summary representation in the shallow version is replaced with a gated recurrent unit in the deep version.
  • Figure 3: Visualization of Deep Classifier output on a representative document. Each row is a sentence in the document, while the shading-color intensity in the first column is proportional to its probability of being in the summary, as estimated by the scoring function. In the columns are the normalized scores from each of the abstract features in Eqn. (\ref{['eq:scoring']}) as well as the final prediction probability (last column). Sentence 2 is estimated to be the most salient, while the longest one, sentence 4, is considered the most content-rich, and not surprisingly, the first sentence the most novel. The third sentence gets the best position based score.