Table of Contents
Fetching ...

BookWorm: A Dataset for Character Description and Analysis

Argyrios Papoudakis, Mirella Lapata, Frank Keller

TL;DR

This study explores the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters, and evaluates state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs.

Abstract

Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.

BookWorm: A Dataset for Character Description and Analysis

TL;DR

This study explores the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters, and evaluates state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs.

Abstract

Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.

Paper Structure

This paper contains 28 sections, 2 figures, 22 tables.

Figures (2)

  • Figure 1: Examples of character description and analysis. Both refer to the transformation of Esther Summerson from a hesitant to a confident narrator. However, the analysis provides more detail focusing on her skill as a narrator (red). The description includes Esther's attributes and behaviour, referring to her as a selfless and nurturing figure, while the analysis provides an interpretation of this trait based on her background (green). The character description briefly touches on Esther’s background, while the analysis demonstrates how, ironically and indirectly, she causes pain to others (grey), adding a moral and psychological dimension.
  • Figure 2: Distribution of genres in BookWorm for character description and analysis tasks.