Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
Fatemeh Shiri, Van Nguyen, Farhad Moghimifar, John Yoo, Gholamreza Haffari, Yuan-Fang Li
TL;DR
The paper tackles hallucination in LLM-based event extraction by decomposing the task into Event Detection ($ED$) and Event Argument Extraction ($EAE$) and by enriching prompts with schema-aware granular instructions and dynamic Retrieval-Augmented Examples ($RAE$). A two-stage pipeline retrieves top-$K$ similar training instances via embeddings (using FAISS and $IndexFlatL2$) and uses these retrieved exemplars to augment prompts for both ED and EAE. Evaluations on ACE05-EN, WikiEvents, and a new MaritimeEvent benchmark show consistent improvements over baselines, with notable gains in $Trig{-}C$ and $Arg{-}C$ F1 scores, and demonstrate the value of decomposition and retrieval augmentation, especially in low-resource or domain-adaptation scenarios. The findings highlight the practical impact for automatic knowledge-graph construction and decision-support systems, where reliable, scalable EE from large text corpora is crucial.
Abstract
Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.
