Table of Contents
Fetching ...

Unifying the Scope of Bridging Anaphora Types in English: Bridging Annotations in ARRAU and GUM

Lauren Levine, Amir Zeldes

TL;DR

There is a large difference in types of phenomena annotated as bridging in the GUM, GENTLE and ARRAU corpora, finding that there is a large difference in types of phenomena annotated as bridging.

Abstract

Comparing bridging annotations across coreference resources is difficult, largely due to a lack of standardization across definitions and annotation schemas and narrow coverage of disparate text domains across resources. To alleviate domain coverage issues and consolidate schemas, we compare guidelines and use interpretable predictive models to examine the bridging instances annotated in the GUM, GENTLE and ARRAU corpora. Examining these cases, we find that there is a large difference in types of phenomena annotated as bridging. Beyond theoretical results, we release a harmonized, subcategorized version of the test sets of GUM, GENTLE and the ARRAU Wall Street Journal data to promote meaningful and reliable evaluation of bridging resolution across domains.

Unifying the Scope of Bridging Anaphora Types in English: Bridging Annotations in ARRAU and GUM

TL;DR

There is a large difference in types of phenomena annotated as bridging in the GUM, GENTLE and ARRAU corpora, finding that there is a large difference in types of phenomena annotated as bridging.

Abstract

Comparing bridging annotations across coreference resources is difficult, largely due to a lack of standardization across definitions and annotation schemas and narrow coverage of disparate text domains across resources. To alleviate domain coverage issues and consolidate schemas, we compare guidelines and use interpretable predictive models to examine the bridging instances annotated in the GUM, GENTLE and ARRAU corpora. Examining these cases, we find that there is a large difference in types of phenomena annotated as bridging. Beyond theoretical results, we release a harmonized, subcategorized version of the test sets of GUM, GENTLE and the ARRAU Wall Street Journal data to promote meaningful and reliable evaluation of bridging resolution across domains.
Paper Structure (16 sections, 4 figures, 8 tables)

This paper contains 16 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Feature importance of XGBoost classifiers trained on GUM and ARRAU WSJ
  • Figure 2: Distribution for antecedent-anaphor entity type combinations for GUM/GENTLE (only combinations with a proportion of 1% or higher are visualized)
  • Figure 3: Distribution for antecedent-anaphor entity type combinations for ARRAU WSJ (only combinations with a proportion of 1% or higher are visualized)
  • Figure 4: Feature importance of XGBoost classifiers trained on GUM and ARRAU WSJ for Mean Decrease Accuracy (MDA)