Instruction-Tuning LLMs for Event Extraction with Annotation Guidelines
Saurabh Srivastava, Sweta Pati, Ziyu Yao
TL;DR
This work investigates how annotation guidelines—textual descriptions of event types and their arguments—affect instruction-tuning of large language models for event extraction (EE). By representing EE outputs in a code-based format and augmenting instructions with both human- and machine-generated guidelines, the authors conduct extensive experiments across ACE05 and RichERE datasets, multiple data regimes, and diverse model architectures, including LLaMA and Qwen. They show that well-constructed guidelines improve event-type discrimination, cross-schema generalization, and data-scarce performance, with machine-generated guidelines often outperforming human-written ones, especially when diversity is ensured through multiple variants. The results demonstrate robust gains across models, domains, and schemas, highlighting the practical value of automated guideline generation for scalable EE systems and pointing to promising directions for future zero-shot and low-resource extraction tasks.
Abstract
In this work, we study the effect of annotation guidelines -- textual descriptions of event types and arguments, when instruction-tuning large language models for event extraction. We conducted a series of experiments with both human-provided and machine-generated guidelines in both full- and low-data settings. Our results demonstrate the promise of annotation guidelines when there is a decent amount of training data and highlight its effectiveness in improving cross-schema generalization and low-frequency event-type performance.
