Table of Contents
Fetching ...

Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches

Adithya Pratapa, Teruko Mitamura

TL;DR

This work systematically contrasts compression-based and full-text approaches for large-scale multi-document summarization across three real-world datasets. By deploying long-context transformers (Llama-3.1, Command-R, Jamba-1.5-Mini) and three compression schemes (retrieval, hierarchical, incremental), the authors demonstrate that full-context and retrieval methods typically yield stronger content-selection performance (A3CU) than iterative compression, which suffers notable information loss. They reveal that compression methods retain salient content in intermediate steps but fail to preserve it through multi-stage pipelines, underscoring the need for hybrid architectures that fuse compression with long-context reasoning. The study also highlights the importance of reference-free evaluation and human judgments, and posits hybrid methods as a practical path toward scalable, high-quality large-scale multi-document summarization.

Abstract

Automatically summarizing large text collections is a valuable tool for document research, with applications in journalism, academic research, legal work, and many other fields. In this work, we contrast two classes of systems for large-scale multi-document summarization (MDS): compression and full-text. Compression-based methods use a multi-stage pipeline and often lead to lossy summaries. Full-text methods promise a lossless summary by relying on recent advances in long-context reasoning. To understand their utility on large-scale MDS, we evaluated them on three datasets, each containing approximately one hundred documents per summary. Our experiments cover a diverse set of long-context transformers (Llama-3.1, Command-R, Jamba-1.5-Mini) and compression methods (retrieval-augmented, hierarchical, incremental). Overall, we find that full-text and retrieval methods perform the best in most settings. With further analysis into the salient information retention patterns, we show that compression-based methods show strong promise at intermediate stages, even outperforming full-context. However, they suffer information loss due to their multi-stage pipeline and lack of global context. Our results highlight the need to develop hybrid approaches that combine compression and full-text approaches for optimal performance on large-scale multi-document summarization.

Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches

TL;DR

This work systematically contrasts compression-based and full-text approaches for large-scale multi-document summarization across three real-world datasets. By deploying long-context transformers (Llama-3.1, Command-R, Jamba-1.5-Mini) and three compression schemes (retrieval, hierarchical, incremental), the authors demonstrate that full-context and retrieval methods typically yield stronger content-selection performance (A3CU) than iterative compression, which suffers notable information loss. They reveal that compression methods retain salient content in intermediate steps but fail to preserve it through multi-stage pipelines, underscoring the need for hybrid architectures that fuse compression with long-context reasoning. The study also highlights the importance of reference-free evaluation and human judgments, and posits hybrid methods as a practical path toward scalable, high-quality large-scale multi-document summarization.

Abstract

Automatically summarizing large text collections is a valuable tool for document research, with applications in journalism, academic research, legal work, and many other fields. In this work, we contrast two classes of systems for large-scale multi-document summarization (MDS): compression and full-text. Compression-based methods use a multi-stage pipeline and often lead to lossy summaries. Full-text methods promise a lossless summary by relying on recent advances in long-context reasoning. To understand their utility on large-scale MDS, we evaluated them on three datasets, each containing approximately one hundred documents per summary. Our experiments cover a diverse set of long-context transformers (Llama-3.1, Command-R, Jamba-1.5-Mini) and compression methods (retrieval-augmented, hierarchical, incremental). Overall, we find that full-text and retrieval methods perform the best in most settings. With further analysis into the salient information retention patterns, we show that compression-based methods show strong promise at intermediate stages, even outperforming full-context. However, they suffer information loss due to their multi-stage pipeline and lack of global context. Our results highlight the need to develop hybrid approaches that combine compression and full-text approaches for optimal performance on large-scale multi-document summarization.

Paper Structure

This paper contains 23 sections, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Salient information retention in the intermediate and final summaries (A3CU recall). For each compression method, we report the best recall from the intermediate outputs and the recall of the final summary. (H: hierarchical, I: incremental, R: retrieval, FC: full-context)
  • Figure 2: Salient information retention in the intermediate and final summaries (A3CU recall) for SummHay (oracle). For each compression method, we report the best recall from the intermediate outputs and the recall of the final summary. (H: hierarchical, I: incremental, R: retrieval, FC: full-context)
  • Figure 3: A3CU F1 score distribution across examples.