Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization
Dongqi Liu, Yifan Wang, Vera Demberg
TL;DR
The paper tackles long-document abstractive summarization by integrating explicit discourse structure into a transformer backbone. It introduces RSTformer, which uses a three-dimensional Labeled Discourse Distribution (LDD) tensor to encode uncertainties and types of RST relations and injects this into Longformer’s sparse attention via per-head relation-specific weighting, producing a more discourse-aware context representation. Across BookSum Chapter, eLife, and Multi-LexSum, RSTformer (with typed relations) consistently outperforms the Longformer baseline and surpasses several SOTA models on multiple metrics, with ablation and human evaluations confirming the value of relation types and uncertainty. The approach improves sentence alignment and abstractiveness, while also highlighting trade-offs between novelty and factual consistency, suggesting strong potential for broader seq2seq tasks beyond summarization.
Abstract
For text summarization, the role of discourse structure is pivotal in discerning the core content of a text. Regrettably, prior studies on incorporating Rhetorical Structure Theory (RST) into transformer-based summarization models only consider the nuclearity annotation, thereby overlooking the variety of discourse relation types. This paper introduces the 'RSTformer', a novel summarization model that comprehensively incorporates both the types and uncertainty of rhetorical relations. Our RST-attention mechanism, rooted in document-level rhetorical structure, is an extension of the recently devised Longformer framework. Through rigorous evaluation, the model proposed herein exhibits significant superiority over state-of-the-art models, as evidenced by its notable performance on several automatic metrics and human evaluation.
