PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

Xinbei Ma; Yeyun Gong; Pengcheng He; Hai Zhao; Nan Duan

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

TL;DR

PROM introduces a phrase level copying mechanism that intensifies attention on overlapped $n$-grams and uses an explicit copying indicator with an auxiliary loss to improve faithfulness in abstractive summarization. Built on a Transformer backbone, PROM combines generation and copying through a learnable distribution and extends copying to $n$-gram phrases. The authors further enable zero-shot capabilities by pre-training PROM on self-supervised raw corpora, constructing pseudo document–summary pairs via $D_{nat}$ and $D_{chunk}$ with an EFD based scoring, and including a lead bias variant. Empirical results show PROM surpasses prior copying methods in supervised fine-tuning and yields competitive or superior zero-shot performance after pre-training, with improvements in factuality and entity coverage and favorable human evaluation outcomes, demonstrating strong cross-domain applicability and practical potential.

Abstract

Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance. This work proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on n-grams, which can be applied to zero-shot summarization with pre-training. PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be copied from the source, and calculates an auxiliary loss for the copying prediction. Empirical studies show that PROM makes significant improvements in fine-tuning on benchmarks. In zero-shot setting, PROM is utilized in the self-supervised pre-training on raw corpora and provides new general baselines on a wide range of summarization datasets. Further analysis shows that PROM performs more reasonable copying and contributes to faithfulness.

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

TL;DR

PROM introduces a phrase level copying mechanism that intensifies attention on overlapped

-grams and uses an explicit copying indicator with an auxiliary loss to improve faithfulness in abstractive summarization. Built on a Transformer backbone, PROM combines generation and copying through a learnable distribution and extends copying to

-gram phrases. The authors further enable zero-shot capabilities by pre-training PROM on self-supervised raw corpora, constructing pseudo document–summary pairs via

and

with an EFD based scoring, and including a lead bias variant. Empirical results show PROM surpasses prior copying methods in supervised fine-tuning and yields competitive or superior zero-shot performance after pre-training, with improvements in factuality and entity coverage and favorable human evaluation outcomes, demonstrating strong cross-domain applicability and practical potential.

Abstract

Paper Structure (28 sections, 8 equations, 4 figures, 13 tables)

This paper contains 28 sections, 8 equations, 4 figures, 13 tables.

Introduction
Related Work
Copying Mechanism
Low-Resource Summarization
Methodology
PROM
Backbone
Copying with Phrase Enhancement
Pre-training for Few-shot Setting
Experiments
Dataset
Summarization data
Pre-training Corpora
Experimental Setup
Fine-tuning Setting
...and 13 more sections

Figures (4)

Figure 1: Overview of the proposed PROM. The left part shows the architecture of our model consisting of the Encoder, Decoder, and Copying module, while the right part shows a closer look at the Copying module.
Figure 2: (a) Extractive Fragments Density & Copy Length. (b) $n$-gram Novelty.
Figure 3: $\operatorname{F}_1$ scores of copied $n$-grams on CNN/DM. "PROM(Pr)" denotes results of PROM$_{pre-train}$.
Figure 4: Position distributions of the overlaps across datasets.

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

TL;DR

Abstract

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)