Table of Contents
Fetching ...

Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction

Quanjiang Guo, Sijie Wang, Jinchuan Zhang, Ben Zhang, Zhao Kang, Ling Tian, Ke Yan

TL;DR

The paper tackles the challenge of zero-shot event extraction by reframing it as a structured code-generation task. It introduces Agent-Event-Coder (AEC), a four-agent framework (Retrieval, Planning, Coding, Verification) that represents event schemas as executable Python classes and uses a schema-as-code verification loop to enforce structural fidelity. A dual-loop refinement procedure iteratively patches code guided by deterministic feedback, enabling precise, complete extractions without labeled data. Across five benchmarks and six LLM backbones, AEC consistently outperforms strong zero-shot baselines, demonstrating robust generalization and the practical viability of treating event extraction like code generation.

Abstract

Zero-shot event extraction (ZSEE) remains a significant challenge for large language models (LLMs) due to the need for complex reasoning and domain-specific understanding. Direct prompting often yields incomplete or structurally invalid outputs--such as misclassified triggers, missing arguments, and schema violations. To address these limitations, we present Agent-Event-Coder (AEC), a novel multi-agent framework that treats event extraction like software engineering: as a structured, iterative code-generation process. AEC decomposes ZSEE into specialized subtasks--retrieval, planning, coding, and verification--each handled by a dedicated LLM agent. Event schemas are represented as executable class definitions, enabling deterministic validation and precise feedback via a verification agent. This programming-inspired approach allows for systematic disambiguation and schema enforcement through iterative refinement. By leveraging collaborative agent workflows, AEC enables LLMs to produce precise, complete, and schema-consistent extractions in zero-shot settings. Experiments across five diverse domains and six LLMs demonstrate that AEC consistently outperforms prior zero-shot baselines, showcasing the power of treating event extraction like code generation. The code and data are released on https://github.com/UESTC-GQJ/Agent-Event-Coder.

Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction

TL;DR

The paper tackles the challenge of zero-shot event extraction by reframing it as a structured code-generation task. It introduces Agent-Event-Coder (AEC), a four-agent framework (Retrieval, Planning, Coding, Verification) that represents event schemas as executable Python classes and uses a schema-as-code verification loop to enforce structural fidelity. A dual-loop refinement procedure iteratively patches code guided by deterministic feedback, enabling precise, complete extractions without labeled data. Across five benchmarks and six LLM backbones, AEC consistently outperforms strong zero-shot baselines, demonstrating robust generalization and the practical viability of treating event extraction like code generation.

Abstract

Zero-shot event extraction (ZSEE) remains a significant challenge for large language models (LLMs) due to the need for complex reasoning and domain-specific understanding. Direct prompting often yields incomplete or structurally invalid outputs--such as misclassified triggers, missing arguments, and schema violations. To address these limitations, we present Agent-Event-Coder (AEC), a novel multi-agent framework that treats event extraction like software engineering: as a structured, iterative code-generation process. AEC decomposes ZSEE into specialized subtasks--retrieval, planning, coding, and verification--each handled by a dedicated LLM agent. Event schemas are represented as executable class definitions, enabling deterministic validation and precise feedback via a verification agent. This programming-inspired approach allows for systematic disambiguation and schema enforcement through iterative refinement. By leveraging collaborative agent workflows, AEC enables LLMs to produce precise, complete, and schema-consistent extractions in zero-shot settings. Experiments across five diverse domains and six LLMs demonstrate that AEC consistently outperforms prior zero-shot baselines, showcasing the power of treating event extraction like code generation. The code and data are released on https://github.com/UESTC-GQJ/Agent-Event-Coder.

Paper Structure

This paper contains 34 sections, 4 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: An illustrative example of the event extraction task. The blue box denotes the event type, while the green boxes represent the argument roles. Underlined words indicate the event trigger or the corresponding event arguments.
  • Figure 2: (a) Conceptual illustration of attention failure. The ideal model effectively leverages contextual information to correctly interpret the trigger word “strike” as an instance of the Protest event type. In contrast, direct zero-shot prompting of LLMs tends to over-rely on the trigger word itself, often leading to misclassification. (b) Illustration of extraction errors caused by insufficient structural fidelity. Outputs generated by direct zero-shot prompting of LLMs may violate the target event schema by: (1) including a non-existent argument role, (2) hallucinating an undefined argument, or (3) producing an argument with an incorrect data type.
  • Figure 3: Overview of the proposed AEC framework. The Full Pipeline View (top) illustrates four specialized agents—Retrieval, Planning, Coding, and Verification—collaborating to generate schema-compliant event objects from unstructured text. The Retrieval Agent self-generates relevant exemplars to bridge the gap between schema definitions and textual context. The Planning Agent produces $\textbf{k}$ trigger–type hypotheses, each with a confidence score and explanatory rationale. The Coding Agent converts the highest-confidence hypothesis into executable Python code that instantiates a predefined event schema. The Dynamic Traversal and Verification Block (bottom) depicts the iterative refinement loop. The generated code is evaluated by the Verification Agent through three deterministic test cases: semantic, type, and format checks (right). If a test fails, the agent patches the code using compiler-like diagnostics. When refinement attempts for the current plan are exhausted, the system backtracks to the next-best hypothesis. This dual-loop architecture ensures that the final output satisfies both semantic correctness and structural fidelity—without requiring any labeled examples.
  • Figure 4: Fixed output template used by all agents.
  • Figure 5: Impact of the number of test cases in verification.