Hierarchical Transformers for Multi-Document Summarization
Yang Liu, Mirella Lapata
TL;DR
This paper tackles abstractive multi-document summarization by introducing a hierarchical Transformer that encodes and relates multiple source paragraphs through local intra-paragraph and global inter-paragraph attention. A learning-based paragraph ranking stage selects informative inputs, while a graph-informed attention mechanism allows external lexical similarity or discourse graphs to guide cross-document reasoning. Evaluations on the WikiSum dataset show substantial improvements over strong baselines, with additional gains when using discourse graphs and longer input contexts. The approach demonstrates scalable modeling of cross-document structure and suggests promising directions for applying hierarchical Transformers to question answering and related inference tasks.
Abstract
In this paper, we develop a neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner. We represent cross-document relationships via an attention mechanism which allows to share information as opposed to simply concatenating text spans and processing them as a flat sequence. Our model learns latent dependencies among textual units, but can also take advantage of explicit graph representations focusing on similarity or discourse relations. Empirical results on the WikiSum dataset demonstrate that the proposed architecture brings substantial improvements over several strong baselines.
