How to Evaluate Coreference in Literary Texts?

Ana-Isabel Duron-Tejedor; Pascal Amsili; Thierry Poibeau

How to Evaluate Coreference in Literary Texts?

Ana-Isabel Duron-Tejedor, Pascal Amsili, Thierry Poibeau

TL;DR

The paper addresses evaluating coreference in literary texts, arguing that standard NLP metrics are ill-suited for long narratives. It surveys traditional link-based and mention-based metrics (MUC, BLANC, B^3, CEAF, LEA) and highlights issues such as the mention-identification effect and lack of interpretability. It shows OntoNotes, a common evaluation corpus, is poorly matched to fiction due to short text length and restricted referential forms, leading to divergent chain-length patterns from novels. The authors propose a context-aware, content-driven evaluation framework that separates long chains, singletons, and short chains to yield interpretable diagnostics and plan to validate across languages and multiple novels.

Abstract

In this short paper, we examine the main metrics used to evaluate textual coreference and we detail some of their limitations. We show that a unique score cannot represent the full complexity of the problem at stake, and is thus uninformative, or even misleading. We propose a new way of evaluating coreference, taking into account the context (in our case, the analysis of fictions, esp. novels). More specifically, we propose to distinguish long coreference chains (corresponding to main characters), from short ones (corresponding to secondary characters), and singletons (isolated elements). This way, we hope to get more interpretable and thus more informative results through evaluation.

How to Evaluate Coreference in Literary Texts?

TL;DR

Abstract

Paper Structure (7 sections, 1 figure, 1 table)

This paper contains 7 sections, 1 figure, 1 table.

Introduction
State of the Art in Coreference Evaluation Methods
The Inadequacy of Traditional Evaluation Schemes for Literary texts
The Inadequacy of Reference Corpora
The Low Readability of Existing Evaluation Methods
Proposals for a Better Evaluation of Coreference in Long Documents
Conclusion

Figures (1)

Figure 1: The distribution of coreference chains in Manon Lescaut, based on the number of mentions per chain. As one can see, the distribution is not even Zipfian (in which case, the curve would be straight), the number of very long chains being extremely limited. Experiments with other novels, by Balzac among others, show a quasi Zipfian curve, showing that more characters with first and secondary roles appear in Balzac's world.

How to Evaluate Coreference in Literary Texts?

TL;DR

Abstract

How to Evaluate Coreference in Literary Texts?

Authors

TL;DR

Abstract

Table of Contents

Figures (1)