Table of Contents
Fetching ...

From Variance to Invariance: Qualitative Content Analysis for Narrative Graph Annotation

Junbo Huang, Max Weinig, Ulrich Fritsche, Ricardo Usbeck

TL;DR

A narrative graph annotation framework that integrates principles from qualitative content analysis (QCA) to prioritize annotation quality by reducing annotation errors is introduced, and practical guidance for NLP research on graph-based narrative annotation under HLV is provided.

Abstract

Narratives in news discourse play a critical role in shaping public understanding of economic events, such as inflation. Annotating and evaluating these narratives in a structured manner remains a key challenge for Natural Language Processing (NLP). In this work, we introduce a narrative graph annotation framework that integrates principles from qualitative content analysis (QCA) to prioritize annotation quality by reducing annotation errors. We present a dataset of inflation narratives annotated as directed acyclic graphs (DAGs), where nodes represent events and edges encode causal relations. To evaluate annotation quality, we employed a $6\times3$ factorial experimental design to examine the effects of narrative representation (six levels) and distance metric type (three levels) on inter-annotator agreement (Krippendorrf's $α$), capturing the presence of human label variation (HLV) in narrative interpretations. Our analysis shows that (1) lenient metrics (overlap-based distance) overestimate reliability, and (2) locally-constrained representations (e.g., one-hop neighbors) reduce annotation variability. Our annotation and implementation of graph-based Krippendorrf's $α$ are open-sourced. The annotation framework and evaluation results provide practical guidance for NLP research on graph-based narrative annotation under HLV.

From Variance to Invariance: Qualitative Content Analysis for Narrative Graph Annotation

TL;DR

A narrative graph annotation framework that integrates principles from qualitative content analysis (QCA) to prioritize annotation quality by reducing annotation errors is introduced, and practical guidance for NLP research on graph-based narrative annotation under HLV is provided.

Abstract

Narratives in news discourse play a critical role in shaping public understanding of economic events, such as inflation. Annotating and evaluating these narratives in a structured manner remains a key challenge for Natural Language Processing (NLP). In this work, we introduce a narrative graph annotation framework that integrates principles from qualitative content analysis (QCA) to prioritize annotation quality by reducing annotation errors. We present a dataset of inflation narratives annotated as directed acyclic graphs (DAGs), where nodes represent events and edges encode causal relations. To evaluate annotation quality, we employed a factorial experimental design to examine the effects of narrative representation (six levels) and distance metric type (three levels) on inter-annotator agreement (Krippendorrf's ), capturing the presence of human label variation (HLV) in narrative interpretations. Our analysis shows that (1) lenient metrics (overlap-based distance) overestimate reliability, and (2) locally-constrained representations (e.g., one-hop neighbors) reduce annotation variability. Our annotation and implementation of graph-based Krippendorrf's are open-sourced. The annotation framework and evaluation results provide practical guidance for NLP research on graph-based narrative annotation under HLV.
Paper Structure (44 sections, 8 equations, 6 figures, 2 tables)

This paper contains 44 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Example of narrative graph annotation under human label variation.
  • Figure 2: Flowchart of iterative QCA research procedure
  • Figure 3: Visual example of six types of narrative representations. (a-c) represent the categorical representation of narratives, where (a) includes all annotated events causing and affected by Inflation, (b) includes only events directly causing Inflation and (c) includes a set of relations (Increases or Decreases) from (e). (d-f) represent the graph-based representation of narratives, where (d) includes all annotated events and their relations from and to Inflation, (e) includes only annotated events directly causing Inflation, as well as their relations, and (f) includes only annotated events directly and indirectly causing Inflation, as well as their relations.
  • Figure 4: Distribution of documents across publication years. The sampling strategy focuses on inflation-peak years: 1990, 1991, 1999, 2001, 2007, 2008, 2021, 2022 and 2023.
  • Figure 5: Average word count of documents across publication years.
  • ...and 1 more figures