Cascading Large Language Models for Salient Event Graph Generation
Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He
TL;DR
This work tackles the challenge of extracting salient, multi-relational event graphs from long documents by introducing CALLMSAE, a cascading LLM framework that first identifies salient events through document summarisation and then generates hierarchical, temporal, and causal event graphs via code-based prompting. It departs from traditional bottom-up extraction (e.g., CAEVO) by focusing on salient events and leveraging iterative refinement with a hallucination grader, producing high-quality graphs that serve as distant supervision signals for contextualised graph generation. The authors present NYT-SEG, a large-scale LLM-generated dataset with $10{,}231$ documents and a human-annotated test set of $100$ documents, and show that fine-tuning contextualised models on NYT-SEG yields improvements over CAEVO-based baselines, with human evaluations validating the graphs. A novel Hungarian Graph Similarity metric assesses edge-level correspondences under semantic embeddings, and results demonstrate that the combination of code-based graph prompts, iterative refinement, and hierarchical-temporal-causal sequencing delivers salient and accurate event graphs suitable for downstream reasoning tasks.
Abstract
Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Powered by CALLMSAE, we present \textit{NYT-SEG}, a large-scale automatically annotated event graph dataset which can serve as distant supervision signals. Fine-tuning contextualised graph generation models on \textit{NYT-SEG} outperforms the models trained on CAEVO data. Results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.
