Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

Congbo Ma; Wei Emma Zhang; Dileepa Pitawela; Haojie Zhuang; Yanfeng Shu

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

Congbo Ma, Wei Emma Zhang, Dileepa Pitawela, Haojie Zhuang, Yanfeng Shu

TL;DR

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation analyzes how input formatting, architectural choices, encoder/decoder perturbations, training schemes, and generation repetition shape MDS performance. It conducts five targeted experiments on Multi-XScience and Multi-News, using 11 metrics and entropy-based uncertainty to contrast flat and hierarchical Transformer structures and analyze separators and noise sensitivity. Key findings include that document boundary separators can help hierarchical models but harm flat ones, flat Transformers perform well on shorter documents, the decoder is more sensitive to noise than the encoder, pretrain-finetune training reliably boosts quality, and repetition correlates with higher uncertainty. These insights inform design guidelines for robust, scalable MDS systems and point to future directions like focused decoder improvements, higher granularity, and strategies to mitigate repetition and uncertainty.

Abstract

The utilization of Transformer-based models prospers the growth of multi-document summarization (MDS). Given the huge impact and widespread adoption of Transformer-based models in various natural language processing tasks, investigating their performance and behaviors in the context of MDS becomes crucial for advancing the field and enhancing the quality of summary. To thoroughly examine the behaviours of Transformer-based MDS models, this paper presents five empirical studies on (1) measuring the impact of document boundary separators quantitatively; (2) exploring the effectiveness of different mainstream Transformer structures; (3) examining the sensitivity of the encoder and decoder; (4) discussing different training strategies; and (5) discovering the repetition in a summary generation. The experimental results on prevalent MDS datasets and eleven evaluation metrics show the influence of document boundary separators, the granularity of different level features and different model training strategies. The results also reveal that the decoder exhibits greater sensitivity to noises compared to the encoder. This underscores the important role played by the decoder, suggesting a potential direction for future research in MDS. Furthermore, the experimental results indicate that the repetition problem in the generated summaries has correlations with the high uncertainty scores.

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 5 figures, 4 tables)

This paper contains 22 sections, 2 equations, 5 figures, 4 tables.

Introduction
Methodology
The Measurable Impact of Document Separators
The Effectiveness of Different Transformer Structures
The Sensitivity of Encoder and Decoder
Different Training Strategies
Repetition in Document Generation
Empirical Studies and Analyses
Settings for Empirical Studies
Impact of Document Separators
Quantitative Performance on Different Transformer Structures
Quantitative Performance on the Sensitivity of Encoder and Decoder
Quantitative Performance of Different Training Strategies
The Relation Between Repetition and Uncertainty
Conclusion and Discussion
...and 7 more sections

Figures (5)

Figure 1: The uncertainty scores of VTC on Multi-News and Multi-XScience. The x-axis and y-axis are the value of uncertainty scores and the number of tokens.
Figure 2: Performance variation with document-level (green line) and sentence-level (orange line) HT models on Multi-XScience (left) and Multi-News (right) datasets. BLEU, Redundancy and Relevance are scaled (0 to 0.6) to make all point in the plot boundary.
Figure 3: The feature visualization of VTC, VTC with self-supervised training and VTC with finetuning after self-supervised training with PCA.
Figure 4: The relationship between uncertainty scores and token repetitions on different summaries.
Figure 5: t-SNE visualization of two embedding spaces on Multi-News dataset with VT, VTC and HT models: (1) token representations before feeding into the Transformer encoder; (2) token representations after feeding into the Transformer encoder. The figures in the 1st row are the visualization with document separators and in the 2st row are the visualization without document separators.

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

TL;DR

Abstract

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)