Table of Contents
Fetching ...

SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

TL;DR

This work tackles the absence of singleton annotations in coreference datasets by reconstructing near-gold singletons using syntactic NP extraction and an ARRAU-based classifier, enabling a pipeline that separates mention detection from coreference linking. It introduces SPLICE, a two-step system that trains on the union of gold coreference marks and predicted mentions, achieving comparable in-domain performance to end-to-end models and improved generalization to out-of-domain data. Key findings show that precision in mention detection drives larger gains in coreference clustering than recall improvements, and that incorporating singletons enhances cross-domain stability (+1.1 F1 on OntoGUM). The study provides a more interpretable framework for coreference by decoupling mention detection from linking and emphasizes the practical value of gold-like singleton annotations for robust discourse analysis across domains.

Abstract

Singleton mentions, i.e.~entities mentioned only once in a text, are important to how humans understand discourse from a theoretical perspective. However previous attempts to incorporate their detection in end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention spans in the OntoNotes benchmark. This paper addresses this limitation by combining predicted mentions from existing nested NER systems and features derived from OntoNotes syntax trees. With this approach, we create a near approximation of the OntoNotes dataset with all singleton mentions, achieving ~94% recall on a sample of gold singletons. We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus. Results indicate that reconstructed singleton training yields results comparable to end-to-end systems for OntoNotes, while improving OOD stability (+1.1 avg. F1). We conduct error analysis for mention detection and delve into its impact on coreference clustering, revealing that precision improvements deliver more substantial benefits than increases in recall for resolving coreference chains.

SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

TL;DR

This work tackles the absence of singleton annotations in coreference datasets by reconstructing near-gold singletons using syntactic NP extraction and an ARRAU-based classifier, enabling a pipeline that separates mention detection from coreference linking. It introduces SPLICE, a two-step system that trains on the union of gold coreference marks and predicted mentions, achieving comparable in-domain performance to end-to-end models and improved generalization to out-of-domain data. Key findings show that precision in mention detection drives larger gains in coreference clustering than recall improvements, and that incorporating singletons enhances cross-domain stability (+1.1 F1 on OntoGUM). The study provides a more interpretable framework for coreference by decoupling mention detection from linking and emphasizes the practical value of gold-like singleton annotations for robust discourse analysis across domains.

Abstract

Singleton mentions, i.e.~entities mentioned only once in a text, are important to how humans understand discourse from a theoretical perspective. However previous attempts to incorporate their detection in end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention spans in the OntoNotes benchmark. This paper addresses this limitation by combining predicted mentions from existing nested NER systems and features derived from OntoNotes syntax trees. With this approach, we create a near approximation of the OntoNotes dataset with all singleton mentions, achieving ~94% recall on a sample of gold singletons. We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus. Results indicate that reconstructed singleton training yields results comparable to end-to-end systems for OntoNotes, while improving OOD stability (+1.1 avg. F1). We conduct error analysis for mention detection and delve into its impact on coreference clustering, revealing that precision improvements deliver more substantial benefits than increases in recall for resolving coreference chains.
Paper Structure (28 sections, 3 equations, 3 figures, 6 tables)

This paper contains 28 sections, 3 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example of the utilization of a syntax tree for the extraction of mentions. The solid box signifies that the NP is a candidate for coreference linking in OntoNotes while the dashed box indicates that the NP is not categorized as a mention.
  • Figure 2: The Pipeline of the Two-step Coreference System Using Singletons. Gold markable spans are leveraged for training mention detection and coreference linking to enhance alignment with the OntoNotes annotation schema.
  • Figure 3: Analyzing the impact of recall and precision scores on the OntoNotes development set. The horizontal dashed line represents the baseline score and the rounded data point denotes the F1 scores achieved by the two-step training pipeline, aligned with their respective precision and recall scores. The vertical dashed line denotes an estimation of avg. F1 and precision score with gold singletons.