Table of Contents
Fetching ...

Generating Uncontextualized and Contextualized Questions for Document-Level Event Argument Extraction

Md Nayem Uddin, Enfa Rose George, Eduardo Blanco, Steven Corman

TL;DR

This work reframes document-level event argument extraction as a question answering task and introduces multiple question-generation strategies that do not require manual annotation. It distinguishes uncontextualized questions (template- and prompt-based) from contextualized questions (SQuAD-based and weakly supervised from LLMs) and demonstrates that a hybrid of these signals yields the strongest performance, especially for inter-sentential arguments. The approach achieves competitive RAMS results, transfers to WikiEvents, and provides a detailed qualitative error analysis, arguing for the practicality of event-grounded question generation as a corpus-agnostic augmentation technique. The findings highlight the value of grounding questions in event context and document cues, with practical implications for robust, cross-domain event-argument extraction in downstream applications.

Abstract

This paper presents multiple question generation strategies for document-level event argument extraction. These strategies do not require human involvement and result in uncontextualized questions as well as contextualized questions grounded on the event and document of interest. Experimental results show that combining uncontextualized and contextualized questions is beneficial, especially when event triggers and arguments appear in different sentences. Our approach does not have corpus-specific components, in particular, the question generation strategies transfer across corpora. We also present a qualitative analysis of the most common errors made by our best model.

Generating Uncontextualized and Contextualized Questions for Document-Level Event Argument Extraction

TL;DR

This work reframes document-level event argument extraction as a question answering task and introduces multiple question-generation strategies that do not require manual annotation. It distinguishes uncontextualized questions (template- and prompt-based) from contextualized questions (SQuAD-based and weakly supervised from LLMs) and demonstrates that a hybrid of these signals yields the strongest performance, especially for inter-sentential arguments. The approach achieves competitive RAMS results, transfers to WikiEvents, and provides a detailed qualitative error analysis, arguing for the practicality of event-grounded question generation as a corpus-agnostic augmentation technique. The findings highlight the value of grounding questions in event context and document cues, with practical implications for robust, cross-domain event-argument extraction in downstream applications.

Abstract

This paper presents multiple question generation strategies for document-level event argument extraction. These strategies do not require human involvement and result in uncontextualized questions as well as contextualized questions grounded on the event and document of interest. Experimental results show that combining uncontextualized and contextualized questions is beneficial, especially when event triggers and arguments appear in different sentences. Our approach does not have corpus-specific components, in particular, the question generation strategies transfer across corpora. We also present a qualitative analysis of the most common errors made by our best model.
Paper Structure (35 sections, 10 figures, 8 tables)

This paper contains 35 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Example from RAMS (top, event trigger importing and its arguments). In this paper, we experiment with several strategies to generate questions for event argument extraction. Questions for the artifact of importing are shown in the dashed box. Q1 is generated following a role-specific template, Q2 and Q3 are generated prompting GPT-4, and Q4 and Q5 are generated by a weakly-supervised T5 model.
  • Figure 2: F1 per argument of our best system (boldfaced in Table \ref{['t:results_combined']}). Frequency in training (between parenthesis) is only a weak indicator of F1. leading to the conclusion that some arguments are easier to learn (e.g., passenger is 70% less frequent than participant yet the former obtains twice the F1 (0.70 vs. 0.33).
  • Figure 3: Average F1 per event (top 15 most frequent events) by our best system (boldfaced in Table \ref{['t:results_combined']}). There is no clear relation between event frequency in training (between parenthesis) and F1, leading to the conclusion that arguments of some events are easier to learn (e.g., selfdirectedbattle vs. payforservice)
  • Figure 4: Confusion matrix comparing gold (rows) and predicted (columns) argument roles for correctly predicted argument spans (top 15 most frequent types). Most errors are plausible (at face value) but semantically wrong argument roles (e.g., mislabeling the beneficiary as the recipient; note that both are usually people).
  • Figure 5: Examples of three categories of questions generated by the weakly supervised T5 model. Text highlighted in green indicates the events, while red text indicates the arguments within the RAMS documents. Argument roles are mentioned in parenthesis. Question about the event and the argument role are grounded on both the event and the argument role, and Question about the event are only grounded on the event. Question about neither the event or the argument role are irrelevant to the event and the argument role. Generated Question are the outputs generated by the weakly supervised model; and Expected Question refers to human-written questions exemplifying perfect questions (hypothetical) we would like to generate.
  • ...and 5 more figures