Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts
Zehan Li, Ruhua Pan, Xinyu Pi
TL;DR
This work tackles the challenge of deriving fine-grained causal graphs from narrative texts, beyond high-level causality. It introduces an end-to-end framework that combines agent-centered vertex extraction, a seven-feature Expert Index, STAC sentence categorization, and a five-step graph-construction process guided by LangChain prompts. The approach, validated on 100 public-domain narratives, shows significant improvements in causal graph quality over strong LLM baselines like GPT-4o and Claude 3.5, while maintaining readability and interpretability. By providing open-source tooling and a linguistics-informed methodology, the method supports scalable, nuanced causal reasoning for discourse analysis and knowledge-graph construction in narrative domains.
Abstract
We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence (STAC) classification model. This hybrid system, combining RoBERTa embeddings with the Expert Index, achieves superior precision in causal link identification compared to pure LLM-based approaches. Finally, a structured five-iteration prompting process refines and constructs connected causal graphs. Experiments on 100 narrative chapters and short stories demonstrate that our approach consistently outperforms GPT-4o and Claude 3.5 in causal graph quality, while maintaining readability. The open-source tool provides an interpretable, efficient solution for capturing nuanced causal chains in narratives.
