Table of Contents
Fetching ...

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, Junji Tomita

TL;DR

The paper addresses whether saliency signals can improve abstractive summarization when using pre-trained seq-to-seq models. It introduces CIT, a model that explicitly injects important tokens into the input to guide generation, and systematically compares nine combinations of saliency and pre-trained models on CNN/DM and XSum. The results show that saliency-guided combinations, especially CIT and CIT+SE, outperform simple fine-tuning, achieving notable ROUGE-L gains on CNN/DM, while highlighting the importance of pseudo-label quality for highly abstractive data. The findings suggest that incorporating saliency information can enhance summarization performance without additional pre-training, offering a practical path to improve extractive-heavy summaries across diverse pre-trained architectures.

Abstract

Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

TL;DR

The paper addresses whether saliency signals can improve abstractive summarization when using pre-trained seq-to-seq models. It introduces CIT, a model that explicitly injects important tokens into the input to guide generation, and systematically compares nine combinations of saliency and pre-trained models on CNN/DM and XSum. The results show that saliency-guided combinations, especially CIT and CIT+SE, outperform simple fine-tuning, achieving notable ROUGE-L gains on CNN/DM, while highlighting the importance of pseudo-label quality for highly abstractive data. The findings suggest that incorporating saliency information can enhance summarization performance without additional pre-training, offering a practical path to improve extractive-heavy summaries across diverse pre-trained architectures.

Abstract

Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.

Paper Structure

This paper contains 44 sections, 9 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Combinations of seq-to-seq and saliency models. Purple: Encoder. Blue: Decoder. Red: Shared encoder, which is a shared model for saliency detection and encoding, used in (a), (b), and (e). Yellow: Extractor, which is an independent saliency model to extract important (c) sentences $X_s$ or (d), (e) tokens $C$ from the source text $X$. Each of these colored blocks represents $M$-layer Transformer blocks. Gray: Linear transformation. Green: Context attention. Pink: Output trained in a supervised manner, where $S$ is the saliency score and $Y$ is the summary.