Table of Contents
Fetching ...

The Role of Natural Language Processing Tasks in Automatic Literary Character Network Construction

Arthur Amalvy, Vincent Labatut, Richard Dufour

TL;DR

The paper investigates how low-level NLP tasks affect automatic construction of literary character networks, focusing on NER and coreference within a cascading Renard-based pipeline and contrasting it with end-to-end LLM approaches. It introduces a perturbation framework that injects errors into gold annotations to quantify impacts on vertex and edge quality, and it proposes new network-accuracy measures grounded in alias matching and coreference graphs. The findings show that NER performance is highly species- and novel-dependent, coreference resolution is crucial for detecting co-occurrences, and certain errors (notably spurious coreference links) are particularly damaging to network integrity; overall, the traditional pipeline often outperforms current LLM-based methods in recall. The work provides a reproducible benchmarking setup, new evaluation metrics, and a pathway for improving character-network extraction in literary texts, highlighting the need for longer, richly annotated datasets for robust evaluation.

Abstract

The automatic extraction of character networks from literary texts is generally carried out using natural language processing (NLP) cascading pipelines. While this approach is widespread, no study exists on the impact of low-level NLP tasks on their performance. In this article, we conduct such a study on a literary dataset, focusing on the role of named entity recognition (NER) and coreference resolution when extracting co-occurrence networks. To highlight the impact of these tasks' performance, we start with gold-standard annotations, progressively add uniformly distributed errors, and observe their impact in terms of character network quality. We demonstrate that NER performance depends on the tested novel and strongly affects character detection. We also show that NER-detected mentions alone miss a lot of character co-occurrences, and that coreference resolution is needed to prevent this. Finally, we present comparison points with 2 methods based on large language models (LLMs), including a fully end-to-end one, and show that these models are outperformed by traditional NLP pipelines in terms of recall.

The Role of Natural Language Processing Tasks in Automatic Literary Character Network Construction

TL;DR

The paper investigates how low-level NLP tasks affect automatic construction of literary character networks, focusing on NER and coreference within a cascading Renard-based pipeline and contrasting it with end-to-end LLM approaches. It introduces a perturbation framework that injects errors into gold annotations to quantify impacts on vertex and edge quality, and it proposes new network-accuracy measures grounded in alias matching and coreference graphs. The findings show that NER performance is highly species- and novel-dependent, coreference resolution is crucial for detecting co-occurrences, and certain errors (notably spurious coreference links) are particularly damaging to network integrity; overall, the traditional pipeline often outperforms current LLM-based methods in recall. The work provides a reproducible benchmarking setup, new evaluation metrics, and a pathway for improving character-network extraction in literary texts, highlighting the need for longer, richly annotated datasets for robust evaluation.

Abstract

The automatic extraction of character networks from literary texts is generally carried out using natural language processing (NLP) cascading pipelines. While this approach is widespread, no study exists on the impact of low-level NLP tasks on their performance. In this article, we conduct such a study on a literary dataset, focusing on the role of named entity recognition (NER) and coreference resolution when extracting co-occurrence networks. To highlight the impact of these tasks' performance, we start with gold-standard annotations, progressively add uniformly distributed errors, and observe their impact in terms of character network quality. We demonstrate that NER performance depends on the tested novel and strongly affects character detection. We also show that NER-detected mentions alone miss a lot of character co-occurrences, and that coreference resolution is needed to prevent this. Finally, we present comparison points with 2 methods based on large language models (LLMs), including a fully end-to-end one, and show that these models are outperformed by traditional NLP pipelines in terms of recall.

Paper Structure

This paper contains 41 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Network extraction performance when applying the coreference perturbations from Section \ref{['sec:methods-coref-perturbation']}.
  • Figure 2: Network quality measures versus number of degradation steps for "add spurious alias mention" and "remove correct alias mention" perturbations.
  • Figure 3: Network quality measures versus the number of coreference resolution degradation steps.