Cross-Document Event-Keyed Summarization
William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White
TL;DR
This work extends event-keyed summarization (EKS) to cross-document settings by introducing SEAMuS, an expert-annotated dataset derived from FAMuS CDAE annotations to support single- and cross-document event-centered summaries. It benchmarks a range of models—from fine-tuned encoder-decoder architectures to zero-/few-shot prompting LLMs—across report-only and cross-document tasks, supplemented by extensive input ablations and robustness analyses under extraction noise. The results show that while large models and few-shot prompting improve performance over a report baseline, cross-document summarization remains more challenging, with robust performance achievable through careful input structure (Text+Event) and retrieval-based context selection. Human evaluation confirms generally high-quality outputs across models but reveals variability across raters, underscoring the need for task-oriented evaluation and user-specific preferences. SEAMuS is released to spur further research into reliable, event-focused synthesis across documents, with planned expansions to more sources and larger-scale annotation efforts.
Abstract
Event-keyed summarization (EKS) requires summarizing a specific event described in a document given the document text and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event as given by multiple sources. We introduce SEAMUS (Summaries of Events Across Multiple Sources), a high-quality dataset for CDEKS based on an expert reannotation of the FAMUS dataset for cross-document argument extraction. We present a suite of baselines on SEAMUS -- covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs -- along with detailed ablations and a human evaluation study, showing SEAMUS to be a valuable benchmark for this new task.
