Table of Contents
Fetching ...

Unsupervised Summarization Re-ranking

Mathieu Ravaut, Shafiq Joty, Nancy Chen

TL;DR

This paper proposes to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsuper supervised and supervised models.

Abstract

With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).

Unsupervised Summarization Re-ranking

TL;DR

This paper proposes to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsuper supervised and supervised models.

Abstract

With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).
Paper Structure (31 sections, 9 equations, 5 figures, 34 tables)

This paper contains 31 sections, 9 equations, 5 figures, 34 tables.

Figures (5)

  • Figure 1: SummScore (unsupervised) re-ranking construction. SummScore leverages the source document for semantic similarity comparisons with summary candidates, as well as to extract a pseudo target.
  • Figure 2: Recall curves on CNN/DM with PEGASUS backbone. The top left plot corresponds to unsupervised summarization re-ranking from \ref{['tab:3b']}, and the next seven plots to all zero-shot transfer summarization setups from \ref{['tab:4']}. Each re-ranking setup has 20 summary candidates, and we show recall over any oracle candidate for several thresholds $k \in \{1,2,3,4,5,7,10\}$.
  • Figure 3: Recall curves on XSum with PEGASUS backbone. The top left plot corresponds to unsupervised summarization re-ranking from \ref{['tab:3b']}, and the next seven plots to all zero-shot transfer summarization setups from \ref{['tab:4']}. Each re-ranking setup has 20 summary candidates, and we show recall over any oracle candidate for several thresholds $k \in \{1,2,3,4,5,7,10\}$.
  • Figure 4: Recall curves on WikiHow with PEGASUS backbone. The top left plot corresponds to unsupervised summarization re-ranking from \ref{['tab:3b']}, and the next eight plots to all zero-shot transfer summarization setups from \ref{['tab:4']}. Each re-ranking setup has 20 summary candidates, and we show recall over any oracle candidate for several thresholds $k \in \{1,2,3,4,5,7,10\}$.
  • Figure 5: Recall curves on SAMSum with PEGASUS backbone. The top left plot corresponds to unsupervised summarization re-ranking from \ref{['tab:3b']}, and the next eight plots to all zero-shot transfer summarization setups from \ref{['tab:4']}. Each re-ranking setup has 20 summary candidates, and we show recall over any oracle candidate for several thresholds $k \in \{1,2,3,4,5,7,10\}$.