ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement

Xinliang Frederick Zhang; Carter Blum; Temma Choji; Shalin Shah; Alakananda Vempala

ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement

Xinliang Frederick Zhang, Carter Blum, Temma Choji, Shalin Shah, Alakananda Vempala

TL;DR

This work tackles document-level event argument extraction (DocEAE) by introducing ULTRA, a hierarchical framework that first derives candidate arguments from chunk-based local processing and then refines them via self-refinement and boundary correction. LEAFER addresses argument boundary identification, while an optional ULTRA+ ensemble integrates a document-level extractor to capture full-article reasoning. ULTRA achieves state-of-the-art EM and HM on DocEE with lower monetary cost compared to strong baselines, aided by calibrated pairwise ranking and the inverted-pyramid pruning strategy. The approach demonstrates strong generalizability with limited annotations and offers tunable window sizes to balance recall and precision, making it practical for real-world, cost-constrained deployments. The work advances open-source LLM utilization for DocEAE and provides a blueprint for scalable, boundary-aware, document-wide information extraction.

Abstract

Structural extraction of events within discourse is critical since it avails a deeper understanding of communication patterns and behavior trends. Event argument extraction (EAE), at the core of event-centric understanding, is the task of identifying role-specific text spans (i.e., arguments) for a given event. Document-level EAE (DocEAE) focuses on arguments that are scattered across an entire document. In this work, we explore open-source Large Language Models (LLMs) for DocEAE, and propose ULTRA, a hierarchical framework that extracts event arguments more cost-effectively. Further, it alleviates the positional bias issue intrinsic to LLMs. ULTRA sequentially reads text chunks of a document to generate a candidate argument set, upon which non-pertinent candidates are dropped through self-refinement. We introduce LEAFER to address the challenge LLMs face in locating the exact boundary of an argument. ULTRA outperforms strong baselines, including strong supervised models and ChatGPT, by 9.8% when evaluated by Exact Match (EM).

ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 2 figures, 7 tables)

This paper contains 20 sections, 2 equations, 2 figures, 7 tables.

Introduction
Methodology
Layer-1: Local Understanding
LEAFER Module
Layer-2: Self-Refinement
Calibration.
Pruning.
Ensembling: ULTRA+
Experiments
Baselines.
Results and Analysis
Using ChatGPT for DocEAE.
Further Study on Window Size.
Conclusion
GPU resources.
...and 5 more sections

Figures (2)

Figure 1: The overall architecture of ULTRA+, which consists of ULTRA (left part) and a document extractor (bottom right). In ULTRA, local extractors (layer-1) first generate a candidate argument set by comprehending text chunks sequentially, upon which self-refinement (layer-2) is performed through pairwise comparison to filter out less pertinent candidates. The predicted boundaries in the initial candidate set are rectified by the LEAFER module.
Figure A1: The impact of window size on the performance of the Layer-1-only variant of ULTRA. Results are based on dev set. With the window size, precision goes up while recall goes down since fewer chunks are fed into ULTRA. The F1 performances plateau after the window size of 15.

ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement

TL;DR

Abstract

ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (2)