PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan
TL;DR
PRIMERA introduces a pyramid-based masking pretraining approach for multi-document summarization that leverages a simple, concatenation-based input structure and Longformer-Encoder-Decoder to efficiently process document clusters. The core novelty, Entity Pyramid Masking, selects cross-document salient sentences via entity-frequency across a cluster to train a Gap Sentence Generation objective, enabling strong zero-/few-/full-supervised performance. Across six datasets from three domains, PRIMERA consistently surpasses state-of-the-art pretrained and dataset-specific models, with notable gains in low-resource settings and favorable human-evaluation results. The work highlights the value of cross-document saliency-aware pretraining and presents a practical, scalable path for multi-document summarization without heavy dataset-specific architectures.
Abstract
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. PRIMERA uses our newly proposed pre-training objective designed to teach the model to connect and aggregate information across documents. It also uses efficient encoder-decoder transformers to simplify the processing of concatenated input documents. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on zero-shot, few-shot and full-supervised settings, PRIMERA outperforms current state-of-the-art dataset-specific and pre-trained models on most of these settings with large margins. The code and pre-trained models can be found at \url{https://github.com/allenai/PRIMER}.
