SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization
Yao Zhao, Mohammad Saleh, Peter J. Liu
TL;DR
SEAL addresses long-form neural abstractive summarization by introducing a segment-wise extractive-abstractive model that attends to a sparse set of input snippets for each output segment. The model is trained end-to-end with jointly optimized abstractive loss and weak extractive supervision via proxy labels, enabling dynamic content selection across very long inputs (up to 100,000 tokens). It achieves state-of-the-art results on arXiv and PubMed and demonstrates strong performance on a new long-input dataset, Search2Wiki, while improving interpretability through explicit snippet selection. The work unifies SDS and MDS, extends extractive-abstractive paradigms, and shows that longer input context coupled with targeted content selection yields substantial gains for LF-NAS.
Abstract
Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability.
