Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization

Dongqi Liu; Yifan Wang; Vera Demberg

Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization

Dongqi Liu, Yifan Wang, Vera Demberg

TL;DR

The paper tackles long-document abstractive summarization by integrating explicit discourse structure into a transformer backbone. It introduces RSTformer, which uses a three-dimensional Labeled Discourse Distribution (LDD) tensor to encode uncertainties and types of RST relations and injects this into Longformer’s sparse attention via per-head relation-specific weighting, producing a more discourse-aware context representation. Across BookSum Chapter, eLife, and Multi-LexSum, RSTformer (with typed relations) consistently outperforms the Longformer baseline and surpasses several SOTA models on multiple metrics, with ablation and human evaluations confirming the value of relation types and uncertainty. The approach improves sentence alignment and abstractiveness, while also highlighting trade-offs between novelty and factual consistency, suggesting strong potential for broader seq2seq tasks beyond summarization.

Abstract

For text summarization, the role of discourse structure is pivotal in discerning the core content of a text. Regrettably, prior studies on incorporating Rhetorical Structure Theory (RST) into transformer-based summarization models only consider the nuclearity annotation, thereby overlooking the variety of discourse relation types. This paper introduces the 'RSTformer', a novel summarization model that comprehensively incorporates both the types and uncertainty of rhetorical relations. Our RST-attention mechanism, rooted in document-level rhetorical structure, is an extension of the recently devised Longformer framework. Through rigorous evaluation, the model proposed herein exhibits significant superiority over state-of-the-art models, as evidenced by its notable performance on several automatic metrics and human evaluation.

Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization

TL;DR

Abstract

Paper Structure (25 sections, 5 equations, 11 figures, 5 tables)

This paper contains 25 sections, 5 equations, 11 figures, 5 tables.

Introduction
Related Work
Text Summarization with RST
Text Summarization with Longformer
Proposed Approach
RST Tensor with Labeled Distributions
RST Sparse Attention
Experiments and Analysis
Experimental Setup
Parser
Datasets
Evaluation Metrics
Training and Inference
Results
Ablation Study
...and 10 more sections

Figures (11)

Figure 1: An example of RST tree: [Rhetorical structure theory (RST) is a theory of text organization.]$^{\mathrm{EDU1}}$ [Although the RST structure is difficult to annotate,]$^{\mathrm{EDU2}}$ [there are still many scholars who have studied it.]$^{\mathrm{EDU3}}$
Figure 2: Labeled discourse distributions
Figure 3: Model architecture: we show a schematic diagram of incorporating $\mathrm{LDD}$ tensor into the attention layer of the model. Specifically, $\mathrm{X}$ is text embedding matrix, and $\mathrm{LDD}$ is incorporated with attention matrix $\mathrm{S}$ in the form of element-wise multiplication. In order to ensure the consistency of matrix shape, we also apply an identical chunk method as Longformer in $\mathrm{LDD}$.
Figure 4: Sentence alignment distribution. L = Longformer, R(w/o) = RSTformer(w/o relations), R(w) = RSTformer(w relations), BC = BookSum Chapter.
Figure 5: N-gram novelty. L = Longformer, R(w/o) = RSTformer(w/o relations), R(w) = RSTformer(w relations), BC = Booksum Chapter, ML = Multi-LexSum.
...and 6 more figures

Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization

TL;DR

Abstract

Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization

Authors

TL;DR

Abstract

Table of Contents

Figures (11)