[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs
Rebecca M. M. Hicke, David Mimno
TL;DR
This paper addresses the challenge of annotating literary coreference in fiction by fine-tuning generative LLMs to emit inline, markdown-like coreference annotations directly from sentences. It compares encoder-decoder (T5/mT5) and decoder-only (Pythia) architectures, finding that T5-3b delivers the best performance with high exact-match replication and strong entity and coreference F1 scores on the LitBank corpus. Multilingual variants (mT5) show potential but generally underperform the English-focused T5 models, while decoder-only Pythia models perform poorly for this task. The study provides a runnable workflow, data splits, and evaluation framework, enabling scalable, reproducible literary coreference annotation and suggesting future work to extend context length and capture richer relational annotations such as emotions or power dynamics.
Abstract
Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems by learning to directly generate a copy of an input sentence with markdown-like annotations. We create, evaluate, and release several trained models for coreference, as well as a workflow for training new models.
