Table of Contents
Fetching ...

Event-Keyed Summarization

William Gantt, Alexander Martin, Pavlo Kuchmiichuk, Aaron Steven White

TL;DR

This work introduces event-keyed summarization (EKS), a task that generates targeted summaries for specific events described in a document by combining event extraction with abstractive summarization. It presents MUCSUM, a dataset built on the classic MUC-4 template annotations, enabling evaluation of summaries that must fuse document context with an event ontology; ablations confirm that both the document and the event template are necessary for high-quality, contextualized summaries. The authors benchmark fine-tuned large language model baselines (BART, T5, PEGASUS) and zero-shot prompting (ChatGPT, GPT-4), using ROUGE, BERTScore, CEAF-REE, and NLI-based metrics, complemented by human judgments. They find that while fine-tuned models benefit from joint document-template input and outperform ablations, zero-shot prompts can yield reasonable, albeit less overlapping, summaries, with human evaluation affirming that references remain superior. Overall, MUCSUM provides a robust, targeted benchmark for EKS and reveals meaningful interactions between input modality, model choice, and evaluation metrics, with practical implications for producing event-centered summaries in information-seeking contexts.

Abstract

We introduce event-keyed summarization (EKS), a novel task that marries traditional summarization and document-level event extraction, with the goal of generating a contextualized summary for a specific event, given a document and an extracted event structure. We introduce a dataset for this task, MUCSUM, consisting of summaries of all events in the classic MUC-4 dataset, along with a set of baselines that comprises both pretrained LM standards in the summarization literature, as well as larger frontier models. We show that ablations that reduce EKS to traditional summarization or structure-to-text yield inferior summaries of target events and that MUCSUM is a robust benchmark for this task. Lastly, we conduct a human evaluation of both reference and model summaries, and provide some detailed analysis of the results.

Event-Keyed Summarization

TL;DR

This work introduces event-keyed summarization (EKS), a task that generates targeted summaries for specific events described in a document by combining event extraction with abstractive summarization. It presents MUCSUM, a dataset built on the classic MUC-4 template annotations, enabling evaluation of summaries that must fuse document context with an event ontology; ablations confirm that both the document and the event template are necessary for high-quality, contextualized summaries. The authors benchmark fine-tuned large language model baselines (BART, T5, PEGASUS) and zero-shot prompting (ChatGPT, GPT-4), using ROUGE, BERTScore, CEAF-REE, and NLI-based metrics, complemented by human judgments. They find that while fine-tuned models benefit from joint document-template input and outperform ablations, zero-shot prompts can yield reasonable, albeit less overlapping, summaries, with human evaluation affirming that references remain superior. Overall, MUCSUM provides a robust, targeted benchmark for EKS and reveals meaningful interactions between input modality, model choice, and evaluation metrics, with practical implications for producing event-centered summaries in information-seeking contexts.

Abstract

We introduce event-keyed summarization (EKS), a novel task that marries traditional summarization and document-level event extraction, with the goal of generating a contextualized summary for a specific event, given a document and an extracted event structure. We introduce a dataset for this task, MUCSUM, consisting of summaries of all events in the classic MUC-4 dataset, along with a set of baselines that comprises both pretrained LM standards in the summarization literature, as well as larger frontier models. We show that ablations that reduce EKS to traditional summarization or structure-to-text yield inferior summaries of target events and that MUCSUM is a robust benchmark for this task. Lastly, we conduct a human evaluation of both reference and model summaries, and provide some detailed analysis of the results.
Paper Structure (34 sections, 2 equations, 2 figures, 5 tables)

This paper contains 34 sections, 2 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: An illustration of the event-keyed summarization (EKS) task on a document and event template from the MUCSUM training split. Given a document and event template, a system must generate a contextualized summary of that specific event.
  • Figure 2: Distribution of ratings for models' summaries across 30 documents in quality evaluation.