Table of Contents
Fetching ...

Document-Level Event Extraction with Definition-Driven ICL

Zhuoyuan Liu, Yilin Luo

TL;DR

This paper tackles prompt design challenges in document-level event extraction by introducing Definition-driven Document-level Event Extraction (DDEE). It combines prompt length optimization, task decomposition with history-aware information, enhanced heuristic prompts, chain-of-thought guidance, and data balancing to improve generalization and precision when using large language models. The approach is evaluated on the WikiEvents dataset with GPT-4 family models, showing improvements in robustness and structured output, while also highlighting the nuanced effects of chain-of-thought prompts. Overall, the work offers practical strategies for prompt engineering that can enhance LLM-based information extraction and potentially extend to other NLP tasks.

Abstract

In the field of Natural Language Processing (NLP), Large Language Models (LLMs) have shown great potential in document-level event extraction tasks, but existing methods face challenges in the design of prompts. To address this issue, we propose an optimization strategy called "Definition-driven Document-level Event Extraction (DDEE)." By adjusting the length of the prompt and enhancing the clarity of heuristics, we have significantly improved the event extraction performance of LLMs. We used data balancing techniques to solve the long-tail effect problem, enhancing the model's generalization ability for event types. At the same time, we refined the prompt to ensure it is both concise and comprehensive, adapting to the sensitivity of LLMs to the style of prompts. In addition, the introduction of structured heuristic methods and strict limiting conditions has improved the precision of event and argument role extraction. These strategies not only solve the prompt engineering problems of LLMs in document-level event extraction but also promote the development of event extraction technology, providing new research perspectives for other tasks in the NLP field.

Document-Level Event Extraction with Definition-Driven ICL

TL;DR

This paper tackles prompt design challenges in document-level event extraction by introducing Definition-driven Document-level Event Extraction (DDEE). It combines prompt length optimization, task decomposition with history-aware information, enhanced heuristic prompts, chain-of-thought guidance, and data balancing to improve generalization and precision when using large language models. The approach is evaluated on the WikiEvents dataset with GPT-4 family models, showing improvements in robustness and structured output, while also highlighting the nuanced effects of chain-of-thought prompts. Overall, the work offers practical strategies for prompt engineering that can enhance LLM-based information extraction and potentially extend to other NLP tasks.

Abstract

In the field of Natural Language Processing (NLP), Large Language Models (LLMs) have shown great potential in document-level event extraction tasks, but existing methods face challenges in the design of prompts. To address this issue, we propose an optimization strategy called "Definition-driven Document-level Event Extraction (DDEE)." By adjusting the length of the prompt and enhancing the clarity of heuristics, we have significantly improved the event extraction performance of LLMs. We used data balancing techniques to solve the long-tail effect problem, enhancing the model's generalization ability for event types. At the same time, we refined the prompt to ensure it is both concise and comprehensive, adapting to the sensitivity of LLMs to the style of prompts. In addition, the introduction of structured heuristic methods and strict limiting conditions has improved the precision of event and argument role extraction. These strategies not only solve the prompt engineering problems of LLMs in document-level event extraction but also promote the development of event extraction technology, providing new research perspectives for other tasks in the NLP field.
Paper Structure (22 sections, 7 figures, 2 tables)

This paper contains 22 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Event Extraction Framework. In this study, we conducted resampling of the Wikievents dataset and clearly defined event types and argument roles. Utilizing a heuristic event framework, we performed a two-step engineering process on the balanced dataset: first extracting event types and triggers, and then applying these results to argument extraction, aiming to enhance the accuracy and efficiency of event extraction.
  • Figure 2: Event Extraction Process based on balanced dataset. This figure provides a detailed illustration of the event extraction process based on a balanced dataset, encompassing event detection, argument extraction, and the final event extraction results. Specifically, it demonstrates how various types of events (such as Movement.Transportation.Unspecified and Conflict.Attack.Unspecified) and their triggers (e.g., "was driven" and "hit") are identified from text segments. It further outlines the extraction of associated argument roles and text (such as Vehicle, Destination, Instrument, Target, Victim). The output is a structured list of events, each comprehensively detailing its type, trigger, corresponding argument roles, and textual descriptions. This process not only enhances the accuracy of event extraction but also enriches the understanding of event contexts within the text.
  • Figure 3: Event Detection Prompting. This figure illustrates the first step of the event extraction prompting process — event detection prompting. It involves defining the task, extracting rules, and employing Definition-Driven Interactive Constructive Learning (ICL) to extract event types and triggers from documents. The system inputs include document content, task definitions, extraction rules, and event trigger definitions. The output consists of a JSON array of objects, each containing document identifiers, event types, and triggers. For instance, it identifies event type "Movement.Transportation.Unspecified" and trigger "was driven" from the text.
  • Figure 4: Argument Role Extraction Prompting. The figure illustrates the second step of the event extraction prompting — argument role extraction — utilizing natural language processing (NLP) techniques to extract the argument roles and their corresponding text from documents. Inputs include document content, task definitions, extraction rules, definition-driven interactive construct learning (ICL), and identified event types and triggers. The system formats each event into JSON objects following a predefined output structure, organized in array format. For instance, for the event type 'Movement.Transportation.Unspecified,' the system identifies 'driven' as the trigger, extracts 'vehicle' as the text for the 'Vehicle' role, and 'Santander National Police Academy' as the text for the 'Destination' role.
  • Figure 5: Argument Role distribution. As can be seen from the figure, the data in the original dataset are unevenly distributed with long tails.
  • ...and 2 more figures