Table of Contents
Fetching ...

End-to-end Neural Coreference Resolution

Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer

TL;DR

The paper presents the first end-to-end neural coreference resolution approach that jointly learns span detection and clustering without syntactic parsers, by scoring all spans with a span-based representation and marginalizing over antecedents. It introduces a span-embedding mechanism with boundary LSTMs and a learned head-attention, plus a pruning scheme to manage computational complexity. Learning is driven by a marginal likelihood objective over gold antecedents, enabling effective credit assignment and high recall despite aggressive pruning. The approach achieves state-of-the-art results on OntoNotes, with notable gains from end-to-end learning and model ensembles, and provides interpretable insights into head-word preferences and failure modes.

Abstract

We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector. The key idea is to directly consider all spans in a document as potential mentions and learn distributions over possible antecedents for each. The model computes span embeddings that combine context-dependent boundary representations with a head-finding attention mechanism. It is trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions. Experiments demonstrate state-of-the-art performance, with a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model ensemble, despite the fact that this is the first approach to be successfully trained with no external resources.

End-to-end Neural Coreference Resolution

TL;DR

The paper presents the first end-to-end neural coreference resolution approach that jointly learns span detection and clustering without syntactic parsers, by scoring all spans with a span-based representation and marginalizing over antecedents. It introduces a span-embedding mechanism with boundary LSTMs and a learned head-attention, plus a pruning scheme to manage computational complexity. Learning is driven by a marginal likelihood objective over gold antecedents, enabling effective credit assignment and high recall despite aggressive pruning. The approach achieves state-of-the-art results on OntoNotes, with notable gains from end-to-end learning and model ensembles, and provides interpretable insights into head-word preferences and failure modes.

Abstract

We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector. The key idea is to directly consider all spans in a document as potential mentions and learn distributions over possible antecedents for each. The model computes span embeddings that combine context-dependent boundary representations with a head-finding attention mechanism. It is trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions. Experiments demonstrate state-of-the-art performance, with a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model ensemble, despite the fact that this is the first approach to be successfully trained with no external resources.

Paper Structure

This paper contains 32 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: First step of the end-to-end coreference resolution model, which computes embedding representations of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that only a manageable number of spans is considered for coreference decisions. In general, the model considers all possible spans up to a maximum width, but we depict here only a small subset.
  • Figure 2: Second step of our model. Antecedent scores are computed from pairs of span representations. The final coreference score of a pair of spans is computed by summing the mention scores of both spans and their pairwise antecedent score.
  • Figure 3: Proportion of gold mentions covered in the development data as we increase the number of spans kept per word. Recall is comparable to the mention detector of previous state-of-the-art systems given the same number of spans. Our model keeps 0.4 spans per word in our experiments, achieving 92.7% recall of gold mentions.
  • Figure 4: Indirect measure of mention precision using agreement with gold syntax. Constituency precision: % of unpruned spans matching syntactic constituents. Head word precision: % of unpruned constituents whose syntactic head word matches the most attended word. Frequency: % of gold spans with each width.