Table of Contents
Fetching ...

Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments

Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

TL;DR

This study revisits this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks, and develops a novel dataset, DiscourseEE, which includes 7,464 argument annotations from online health discourse.

Abstract

Prior works formulate the extraction of event-specific arguments as a span extraction problem, where event arguments are explicit -- i.e. assumed to be contiguous spans of text in a document. In this study, we revisit this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks. First, implicit arguments are event arguments which are not explicitly mentioned in the text, but can be inferred through context. Second, scattered arguments are event arguments that are composed of information scattered throughout the text. These two argument types are crucial to elicit the full breadth of information required for proper event modeling. To support the extraction of explicit, implicit, and scattered arguments, we develop a novel dataset, DiscourseEE, which includes 7,464 argument annotations from online health discourse. Notably, 51.2% of the arguments are implicit, and 17.4% are scattered, making DiscourseEE a unique corpus for complex event extraction. Additionally, we formulate argument extraction as a text generation problem to facilitate the extraction of complex argument types. We provide a comprehensive evaluation of state-of-the-art models and highlight critical open challenges in generative event extraction. Our data and codebase are available at https://omar-sharif03.github.io/DiscourseEE.

Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments

TL;DR

This study revisits this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks, and develops a novel dataset, DiscourseEE, which includes 7,464 argument annotations from online health discourse.

Abstract

Prior works formulate the extraction of event-specific arguments as a span extraction problem, where event arguments are explicit -- i.e. assumed to be contiguous spans of text in a document. In this study, we revisit this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks. First, implicit arguments are event arguments which are not explicitly mentioned in the text, but can be inferred through context. Second, scattered arguments are event arguments that are composed of information scattered throughout the text. These two argument types are crucial to elicit the full breadth of information required for proper event modeling. To support the extraction of explicit, implicit, and scattered arguments, we develop a novel dataset, DiscourseEE, which includes 7,464 argument annotations from online health discourse. Notably, 51.2% of the arguments are implicit, and 17.4% are scattered, making DiscourseEE a unique corpus for complex event extraction. Additionally, we formulate argument extraction as a text generation problem to facilitate the extraction of complex argument types. We provide a comprehensive evaluation of state-of-the-art models and highlight critical open challenges in generative event extraction. Our data and codebase are available at https://omar-sharif03.github.io/DiscourseEE.
Paper Structure (38 sections, 8 figures, 11 tables)

This paper contains 38 sections, 8 figures, 11 tables.

Figures (8)

  • Figure 1: An example demonstrating complex event arguments that are prevalent in online discourse. This Reddit post is narrated by a newly diagnosed prostate cancer patient who is seeking treatment information from online peers on the r/ProstateCancer subreddit. In addition to explicit arguments, it contains implicit and scattered arguments that cannot be extracted using one contiguous span of text.
  • Figure 2: Example annotation in DiscourseEE. Core arguments capture the key aspects of the advice, while type-specific, subject-specific, and effect-specific arguments capture the fine-grained details. An argument can be explicit, implicit, or scattered throughout the document, e.g., as the individual is tapering suboxone, the goal dosage is '0mg.' which is not directly mentioned in the text. We separately annotate the arguments from posts and comments. However, due to label sparsity, we merge them during model evaluation. The argument value is set to 'null' if absent, and multiple values for a role are comma-separated.
  • Figure 3: Event ontology of DiscourseEE dataset. Details of arguments provided in Table \ref{['argument-details']}.
  • Figure 4: Advice classification prompt template.
  • Figure 5: DiscourseEE development pipeline
  • ...and 3 more figures