Who Is the Story About? Protagonist Entity Recognition in News
Jorge Gabín, M. Eduardo Ares, Javier Parapar
TL;DR
Protagonist Entity Recognition (PER) reframes information extraction as a discourse-level task to determine which organisations drive a news narrative, rather than merely listing mentions. The authors formalize PER, create a human-annotated gold benchmark, and demonstrate that exemplar-guided prompting of state-of-the-art LLMs can approximate human judgments and scale annotation to large corpora. They further investigate how context and exemplar choices affect performance, showing that larger models with exemplars achieve robust protagonist detection while smaller models require careful calibration. The work provides datasets, baselines, and methodological guidance for narrative-centered information extraction with potential impact on media analysis, knowledge graph enrichment, and downstream narrative understanding.
Abstract
News articles often reference numerous organizations, but traditional Named Entity Recognition (NER) treats all mentions equally, obscuring which entities genuinely drive the narrative. This limits downstream tasks that rely on understanding event salience, influence, or narrative focus. We introduce Protagonist Entity Recognition (PER), a task that identifies the organizations that anchor a news story and shape its main developments. To validate PER, we compare he predictions of Large Language Models (LLMs) against annotations from four expert annotators over a gold corpus, establishing both inter-annotator consistency and human-LLM agreement. Leveraging these findings, we use state-of-the-art LLMs to automatically label large-scale news collections through NER-guided prompting, generating scalable, high-quality supervision. We then evaluate whether other LLMs, given reduced context and without explicit candidate guidance, can still infer the correct protagonists. Our results demonstrate that PER is a feasible and meaningful extension to narrative-centered information extraction, and that guided LLMs can approximate human judgments of narrative importance at scale.
