Table of Contents
Fetching ...

Asking and Answering Questions to Extract Event-Argument Structures

Md Nayem Uddin, Enfa Rose George, Eduardo Blanco, Steven Corman

TL;DR

This work reframes document-level event-argument extraction as a question-answering task, introducing two question-generation paradigms (template- and transformer-based) and novel data augmentation strategies to address inter-sentential arguments. By leveraging transfer learning from existing corpora and a RoBERTa-based QA reader, the approach achieves competitive RAMS results, surpassing prior state-of-the-art when using transformer-generated questions and augmented data. Zero-/few-shot GPT-3 experiments show the supervised QA approach remains superior, while analyses reveal the method's strength in inter-sentential argument extraction and its vulnerabilities to annotation and coreference errors. Overall, the study demonstrates a scalable, generalizable framework for extracting rich event-argument structures across sentences, with meaningful implications for information extraction and downstream NLP tasks.

Abstract

This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swapping technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.

Asking and Answering Questions to Extract Event-Argument Structures

TL;DR

This work reframes document-level event-argument extraction as a question-answering task, introducing two question-generation paradigms (template- and transformer-based) and novel data augmentation strategies to address inter-sentential arguments. By leveraging transfer learning from existing corpora and a RoBERTa-based QA reader, the approach achieves competitive RAMS results, surpassing prior state-of-the-art when using transformer-generated questions and augmented data. Zero-/few-shot GPT-3 experiments show the supervised QA approach remains superior, while analyses reveal the method's strength in inter-sentential argument extraction and its vulnerabilities to annotation and coreference errors. Overall, the study demonstrates a scalable, generalizable framework for extracting rich event-argument structures across sentences, with meaningful implications for information extraction and downstream NLP tasks.

Abstract

This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swapping technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.
Paper Structure (37 sections, 7 figures, 9 tables)

This paper contains 37 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Event trigger (importing) and its arguments in the same (artifact and transporter) and surrounding sentences (vehicle and origin). We cast the problem of extracting the arguments of an event as a question-answering task. Questions are automatically generated (and answered) for each argument an event may have.
  • Figure 2: Examples of the data augmentation strategies (gold event: agreements, highlighted in red; gold argument: Clinton, highlighted in green). Blue highlights indicate the arguments in the augmented samples. SS stands for Simple Swapping (P: Plain, V: Verbose), CR for Coreference Resolution (R: Random, M: Most Meaningful), and LLM for Large Language Model (P: Pegasus, G: GPT-3). In the gold sample, the event-argument is intra-sentential. Five of the six data augmentation strategies result in an inter-sentential argument.
  • Figure 3: F1 per argument of our best model (boldfaced in Table \ref{['t:results_transfer']}, large). Frequency in training (between parenthesis) is only a weak indicator of F1, leading to the conclusion that some arguments are easier to learn. For example, employee is less frequent than participant yet the former obtains twice the F1 (0.70 vs. 0.33).
  • Figure 4: Average F1 per event (top 15 most frequent events) by our best model (boldfaced in Table \ref{['t:results_transfer']}, large). There is no clear relation between event frequency in training (between parenthesis) and F1, leading to the conclusion that arguments of some events are easier to learn (e.g., selfdirectedbattle vs. transfortartifact)
  • Figure 5: Confusion matrix comparing gold (rows) and predicted (columns) argument types for correctly predicted argument spans (top 15 most frequent types). Most errors are plausible (at face value) but semantically wrong argument types (e.g., mislabeling the beneficiary as the recipient; note that both are usually people).
  • ...and 2 more figures