Table of Contents
Fetching ...

Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum

TL;DR

This work introduces two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data and introduces a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain.

Abstract

Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The need to model zero and few-shot roles (i.e. event roles with little to no training representation). In this work, we introduce two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data. Our highest performing methods provide a 16-pt increase in F1 score on extraction of zero shot role types. To better facilitate analysis of cross-domain EAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that LLM-based augmentation can boost RDF1 performance by up to 11 F1 points compared to baseline methods.

Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

TL;DR

This work introduces two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data and introduces a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain.

Abstract

Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The need to model zero and few-shot roles (i.e. event roles with little to no training representation). In this work, we introduce two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data. Our highest performing methods provide a 16-pt increase in F1 score on extraction of zero shot role types. To better facilitate analysis of cross-domain EAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that LLM-based augmentation can boost RDF1 performance by up to 11 F1 points compared to baseline methods.
Paper Structure (36 sections, 1 equation, 6 figures, 5 tables)

This paper contains 36 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: In (a) we leverage LLM's prior knowledge of Mad Libs to generate templated, categorized documents. MadLib solutions can then be used as EAE annotation. (b) We generate event structure data for an event and use LLMs to convert the structure into an EAE document. We employ semantic n-gram matching to align the structure with spans in the generated document.
  • Figure 2: Visualization showing how TTE Depth ranks DocEE roles. We find that roles such as "Maximum Wind Speed", "Damaged Crops & Livestock" and "Magnitude (Tsunami Heights)" are outliers compared to roles such as "Influence People" and "Temporary Settlement".
  • Figure 3: Example data for Tsunami Event using Struct-2-Text (GPT-4).
  • Figure 4: Example data for Earthquake event using Mad Lib Aug (GPT-4).
  • Figure 5: Example data for Droughts Event using Mad Lib Aug (Llama-7b). Note that sample quality suffers for smaller LLMs. However, such samples still serve as useful training signal as shown in our results.
  • ...and 1 more figures