EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models
Jun Gao, Huan Zhao, Wei Wang, Changlong Yu, Ruifeng Xu
TL;DR
EventRL introduces outcome supervision for event extraction (EE) with LLMs, addressing instruction-following and hallucination by shaping policy updates via rewards based on Trigger-F1 and Argument-F1. The framework initializes from supervised fine-tuning, then uses reinforcement learning with Arg-F1, AVG-F1, or Prod-F1 rewards, paired with stabilization techniques like Teacher-Force Threshold and Advantage Clipping to maintain learning stability. Experiments on ACE05 across LLaMa and CodeLLaMa show EventRL consistently outperforms Few-Shot Prompting and standard SFT, with notable gains on unseen event types and when incorporating code data pretraining. The work highlights the importance of reward design and data modality, demonstrating that larger models generalize better up to a point, and that outcome supervision can yield robust EE with improved structure and fewer undefined events, albeit at higher computational cost and with dataset-quality considerations.
Abstract
In this study, we present EventRL, a reinforcement learning approach developed to enhance event extraction for large language models (LLMs). EventRL utilizes outcome supervision with specific reward functions to tackle prevalent challenges in LLMs, such as instruction following and hallucination, manifested as the mismatch of event structure and the generation of undefined event types. We evaluate EventRL against existing methods like Few-Shot Prompting (FSP) (based on GPT4) and Supervised Fine-Tuning (SFT) across various LLMs, including GPT-4, LLaMa, and CodeLLaMa models. Our findings show that EventRL significantly outperforms these conventional approaches by improving the performance in identifying and structuring events, particularly in handling novel event types. The study emphasizes the critical role of reward function selection and demonstrates the benefits of incorporating code data for better event extraction. While increasing model size leads to higher accuracy, maintaining the ability to generalize is essential to avoid overfitting.
