Table of Contents
Fetching ...

Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts

Zehan Li, Ruhua Pan, Xinyu Pi

TL;DR

This work tackles the challenge of deriving fine-grained causal graphs from narrative texts, beyond high-level causality. It introduces an end-to-end framework that combines agent-centered vertex extraction, a seven-feature Expert Index, STAC sentence categorization, and a five-step graph-construction process guided by LangChain prompts. The approach, validated on 100 public-domain narratives, shows significant improvements in causal graph quality over strong LLM baselines like GPT-4o and Claude 3.5, while maintaining readability and interpretability. By providing open-source tooling and a linguistics-informed methodology, the method supports scalable, nuanced causal reasoning for discourse analysis and knowledge-graph construction in narrative domains.

Abstract

We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence (STAC) classification model. This hybrid system, combining RoBERTa embeddings with the Expert Index, achieves superior precision in causal link identification compared to pure LLM-based approaches. Finally, a structured five-iteration prompting process refines and constructs connected causal graphs. Experiments on 100 narrative chapters and short stories demonstrate that our approach consistently outperforms GPT-4o and Claude 3.5 in causal graph quality, while maintaining readability. The open-source tool provides an interpretable, efficient solution for capturing nuanced causal chains in narratives.

Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts

TL;DR

This work tackles the challenge of deriving fine-grained causal graphs from narrative texts, beyond high-level causality. It introduces an end-to-end framework that combines agent-centered vertex extraction, a seven-feature Expert Index, STAC sentence categorization, and a five-step graph-construction process guided by LangChain prompts. The approach, validated on 100 public-domain narratives, shows significant improvements in causal graph quality over strong LLM baselines like GPT-4o and Claude 3.5, while maintaining readability and interpretability. By providing open-source tooling and a linguistics-informed methodology, the method supports scalable, nuanced causal reasoning for discourse analysis and knowledge-graph construction in narrative domains.

Abstract

We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence (STAC) classification model. This hybrid system, combining RoBERTa embeddings with the Expert Index, achieves superior precision in causal link identification compared to pure LLM-based approaches. Finally, a structured five-iteration prompting process refines and constructs connected causal graphs. Experiments on 100 narrative chapters and short stories demonstrate that our approach consistently outperforms GPT-4o and Claude 3.5 in causal graph quality, while maintaining readability. The open-source tool provides an interpretable, efficient solution for capturing nuanced causal chains in narratives.

Paper Structure

This paper contains 33 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of our DA framework. It is an end to end Model. First we input a Random Narrative Text. Then in Stage 1, we Contribute the Vertices of the Graph. And in Stage 2, we Use our Expert Index to indicate the Vertices. Next, In stage 3,we use a STAC system to label the Vertices. In STAGE 4, we use STAC Label + Vertices to complete the Causal Graph
  • Figure 2: F1-score-score comparison across STAC labels for all six models. Each curve corresponds to a classification method, plotting F1-score for the four individual labels (S, T, A, C) and the overall macro-F1-score (rightmost point). The XGBoost model using both RoBERTa embeddings and Expert Index features (red curve) achieves the highest F1-score in every category.
  • Figure 3: Example Graph Generation of Emperor's Cloth