Table of Contents
Fetching ...

CHATTER: A Character Attribution Dataset for Narrative Understanding

Sabyasachee Baruah, Shrikanth Narayanan

TL;DR

The paper introduces Chatter, a scalable dataset for narrative character attribution using TVTropes as discrete tropes, linked to long-form movie screenplays across 660 films to form 88,124 character-trope pairs over 2,998 characters and 12,967 tropes. A human-annotated subset, ChatterEval, provides a reliable evaluation benchmark with multi-annotator judgments and a mechanism to weigh samples by confidence, enabling robust long-context understanding assessments. Experiments across five models and multiple prompting strategies reveal that closed-source LLMs and segment-based prompts perform best, while Chatter labels closely track model outputs, supporting Chatter as a training resource for attribution tasks. The work highlights both the value and limitations of trope-based attribution, emphasizing long-context processing and the need for careful evaluation to distinguish model priors from script-derived cues, with future work aimed at improving coverage, annotation reliability, and model training for narrative understanding.

Abstract

Computational narrative understanding studies the identification, description, and interaction of the elements of a narrative: characters, attributes, events, and relations. Narrative research has given considerable attention to defining and classifying character types. However, these character-type taxonomies do not generalize well because they are small, too simple, or specific to a domain. We require robust and reliable benchmarks to test whether narrative models truly understand the nuances of the character's development in the story. Our work addresses this by curating the CHATTER dataset that labels whether a character portrays some attribute for 88124 character-attribute pairs, encompassing 2998 characters, 12967 attributes and 660 movies. We validate a subset of CHATTER, called CHATTEREVAL, using human annotations to serve as a benchmark to evaluate the character attribution task in movie scripts. \evaldataset{} also assesses narrative understanding and the long-context modeling capacity of language models.

CHATTER: A Character Attribution Dataset for Narrative Understanding

TL;DR

The paper introduces Chatter, a scalable dataset for narrative character attribution using TVTropes as discrete tropes, linked to long-form movie screenplays across 660 films to form 88,124 character-trope pairs over 2,998 characters and 12,967 tropes. A human-annotated subset, ChatterEval, provides a reliable evaluation benchmark with multi-annotator judgments and a mechanism to weigh samples by confidence, enabling robust long-context understanding assessments. Experiments across five models and multiple prompting strategies reveal that closed-source LLMs and segment-based prompts perform best, while Chatter labels closely track model outputs, supporting Chatter as a training resource for attribution tasks. The work highlights both the value and limitations of trope-based attribution, emphasizing long-context processing and the need for careful evaluation to distinguish model priors from script-derived cues, with future work aimed at improving coverage, annotation reliability, and model training for narrative understanding.

Abstract

Computational narrative understanding studies the identification, description, and interaction of the elements of a narrative: characters, attributes, events, and relations. Narrative research has given considerable attention to defining and classifying character types. However, these character-type taxonomies do not generalize well because they are small, too simple, or specific to a domain. We require robust and reliable benchmarks to test whether narrative models truly understand the nuances of the character's development in the story. Our work addresses this by curating the CHATTER dataset that labels whether a character portrays some attribute for 88124 character-attribute pairs, encompassing 2998 characters, 12967 attributes and 660 movies. We validate a subset of CHATTER, called CHATTEREVAL, using human annotations to serve as a benchmark to evaluate the character attribution task in movie scripts. \evaldataset{} also assesses narrative understanding and the long-context modeling capacity of language models.

Paper Structure

This paper contains 20 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Interface of the annotation task.