Table of Contents
Fetching ...

PreSumm: Predicting Summarization Performance Without Summarizing

Steven Koniaev, Ori Ernst, Jackie Chi Kit Cheung

TL;DR

The paper tackles why automatic summarization performance varies across documents by proposing PreSumm, a task that predicts a document’s average summarization quality across multiple systems using only the document text. It introduces the RoSE-based dataset with ACU-driven gold scores and develops several models (PreSummReg, PreSummClas, PreSummRegFroz) with PreSummReg achieving the strongest correlation to gold scores, outperforming baselines and even some reference-based metrics in certain settings. The authors demonstrate PreSumm’s practical value through extrinsic evaluations: prioritizing documents for manual summarization in hybrid systems and improving multi-document summarization by filtering or reordering documents, with strong generalization to out-of-domain Pyramid evaluations. Manual analysis reveals that documents with low PreSummReg scores tend to exhibit coherence problems, content complexity, and lack a single clear theme, offering insights for future improvements in summarization systems. Limitations include the dataset’s news-domain focus and resource-intensive ACU annotations, suggesting future work on broader domains and automated evaluation proxies.

Abstract

Despite recent advancements in automatic summarization, state-of-the-art models do not summarize all documents equally well, raising the question: why? While prior research has extensively analyzed summarization models, little attention has been given to the role of document characteristics in influencing summarization performance. In this work, we explore two key research questions. First, do documents exhibit consistent summarization quality across multiple systems? If so, can we predict a document's summarization performance without generating a summary? We answer both questions affirmatively and introduce PreSumm, a novel task in which a system predicts summarization performance based solely on the source document. Our analysis sheds light on common properties of documents with low PreSumm scores, revealing that they often suffer from coherence issues, complex content, or a lack of a clear main theme. In addition, we demonstrate PreSumm's practical utility in two key applications: improving hybrid summarization workflows by identifying documents that require manual summarization and enhancing dataset quality by filtering outliers and noisy documents. Overall, our findings highlight the critical role of document properties in summarization performance and offer insights into the limitations of current systems that could serve as the basis for future improvements.

PreSumm: Predicting Summarization Performance Without Summarizing

TL;DR

The paper tackles why automatic summarization performance varies across documents by proposing PreSumm, a task that predicts a document’s average summarization quality across multiple systems using only the document text. It introduces the RoSE-based dataset with ACU-driven gold scores and develops several models (PreSummReg, PreSummClas, PreSummRegFroz) with PreSummReg achieving the strongest correlation to gold scores, outperforming baselines and even some reference-based metrics in certain settings. The authors demonstrate PreSumm’s practical value through extrinsic evaluations: prioritizing documents for manual summarization in hybrid systems and improving multi-document summarization by filtering or reordering documents, with strong generalization to out-of-domain Pyramid evaluations. Manual analysis reveals that documents with low PreSummReg scores tend to exhibit coherence problems, content complexity, and lack a single clear theme, offering insights for future improvements in summarization systems. Limitations include the dataset’s news-domain focus and resource-intensive ACU annotations, suggesting future work on broader domains and automated evaluation proxies.

Abstract

Despite recent advancements in automatic summarization, state-of-the-art models do not summarize all documents equally well, raising the question: why? While prior research has extensively analyzed summarization models, little attention has been given to the role of document characteristics in influencing summarization performance. In this work, we explore two key research questions. First, do documents exhibit consistent summarization quality across multiple systems? If so, can we predict a document's summarization performance without generating a summary? We answer both questions affirmatively and introduce PreSumm, a novel task in which a system predicts summarization performance based solely on the source document. Our analysis sheds light on common properties of documents with low PreSumm scores, revealing that they often suffer from coherence issues, complex content, or a lack of a clear main theme. In addition, we demonstrate PreSumm's practical utility in two key applications: improving hybrid summarization workflows by identifying documents that require manual summarization and enhancing dataset quality by filtering outliers and noisy documents. Overall, our findings highlight the critical role of document properties in summarization performance and offer insights into the limitations of current systems that could serve as the basis for future improvements.

Paper Structure

This paper contains 36 sections, 2 equations, 1 figure, 13 tables, 1 algorithm.

Figures (1)

  • Figure 1: An illustration comparing the traditional evaluation process (top) with our approach (bottom).