Table of Contents
Fetching ...

Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints

Greg Durrett, Taylor Berg-Kirkpatrick, Dan Klein

TL;DR

Durrett, Berg-Kirkpatrick, and Klein present a discriminative, ILP-based framework for single-document summarization that jointly handles compression and anaphoricity constraints. The model uses a rich feature set learned on a large corpus (New York Times Annotated Corpus) and combines RST-based and syntactic compressions with pronoun rewriting and antecedent constraints. It is trained end-to-end via structured SVM and loss-augmented decoding, outperforming baselines on ROUGE and improving linguistic quality in human judgments. The results demonstrate that strong content selection can coexist with fluency and referential coherence under expressive compression constraints.

Abstract

We present a discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints. Our model selects textual units to include in the summary based on a rich set of sparse features whose weights are learned on a large corpus. We allow for the deletion of content within a sentence when that deletion is licensed by compression rules; in our framework, these are implemented as dependencies between subsentential units of text. Anaphoricity constraints then improve cross-sentence coherence by guaranteeing that, for each pronoun included in the summary, the pronoun's antecedent is included as well or the pronoun is rewritten as a full mention. When trained end-to-end, our final system outperforms prior work on both ROUGE as well as on human judgments of linguistic quality.

Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints

TL;DR

Durrett, Berg-Kirkpatrick, and Klein present a discriminative, ILP-based framework for single-document summarization that jointly handles compression and anaphoricity constraints. The model uses a rich feature set learned on a large corpus (New York Times Annotated Corpus) and combines RST-based and syntactic compressions with pronoun rewriting and antecedent constraints. It is trained end-to-end via structured SVM and loss-augmented decoding, outperforming baselines on ROUGE and improving linguistic quality in human judgments. The results demonstrate that strong content selection can coexist with fluency and referential coherence under expressive compression constraints.

Abstract

We present a discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints. Our model selects textual units to include in the summary based on a rich set of sparse features whose weights are learned on a large corpus. We allow for the deletion of content within a sentence when that deletion is licensed by compression rules; in our framework, these are implemented as dependencies between subsentential units of text. Anaphoricity constraints then improve cross-sentence coherence by guaranteeing that, for each pronoun included in the summary, the pronoun's antecedent is included as well or the pronoun is rewritten as a full mention. When trained end-to-end, our final system outperforms prior work on both ROUGE as well as on human judgments of linguistic quality.

Paper Structure

This paper contains 20 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: ILP formulation of our single-document summarization model. The basic model extracts a set of textual units with binary variables $\mathbf{x}^\textrm{unit}$ subject to a length constraint. These textual units $\mathbf{u}$ are scored with weights $\mathbf{w}$ and features $\mathbf{f}$. Next, we add constraints derived from both syntactic parses and Rhetorical Structure Theory (RST) to enforce grammaticality. Finally, we add anaphora constraints derived from coreference in order to improve summary coherence. We introduce additional binary variables $\mathbf{x}^\textrm{ref}$ that control whether each pronoun is replaced with its antecedent using a candidate replacement $r_{ij}$. These are also scored in the objective and are incorporated into the length constraint.
  • Figure 2: Compression constraints on an example sentence. (a) RST-based compression structure like that in ?), where we can delete the Elaboration clause. (b) Two syntactic compression options from ?), namely deletion of a coordinate and deletion of a PP modifier. (c) Textual units and requirement relations (arrows) after merging all of the available compressions. (d) Process of augmenting a textual unit with syntactic compressions.
  • Figure 3: Modifications to the ILP to capture pronoun coherence. It, which refers to Kellogg, has several possible antecedents from the standpoint of an automatic coreference system DurrettKlein2014. If the coreference system is confident about its selection (above a threshold $\alpha$ on the posterior probability), we allow for the model to explicitly replace the pronoun if its antecedent would be deleted (Section \ref{['sec:pron_rep']}). Otherwise, we merely constrain one or more probable antecedents to be included (Section \ref{['sec:pron_ant']}); even if the coreference system is incorrect, a human can often correctly interpret the pronoun with this additional context.
  • Figure 4: Examples of an article kept in the NYT50 dataset (top) and an article removed because the summary is too short. The top summary has a rich structure to it, corresponding to various parts of the document (bolded) and including some text that is essentially a direct extraction.
  • Figure 5: Counts on a 1000-document sample of how frequently both a document prefix baseline and a ROUGE oracle summary contain sentences at various indices in the document. There is a long tail of useful sentences later in the document, as seen by the fact that the oracle sentence counts drop off relatively slowly. Smart selection of content therefore has room to improve over taking a prefix of the document.