Table of Contents
Fetching ...

The Power of Summary-Source Alignments

Ori Ernst, Ori Shapira, Aviv Slobodkin, Sharon Adar, Mohit Bansal, Jacob Goldberger, Ran Levy, Ido Dagan

TL;DR

This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level, annotating alignment manually in a multi-document setup, and revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks.

Abstract

Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically on the sentence level on a limited number of subtasks. In this paper, we propose extending the summary-source alignment framework by (1) applying it at the more fine-grained proposition span level, (2) annotating alignment manually in a multi-document setup, and (3) revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks. Specifically, for each of the tasks, we release a manually annotated test set that was derived automatically from the alignment annotation. We also release development and train sets in the same way, but from automatically derived alignments. Using the datasets, each task is demonstrated with baseline models and corresponding evaluation metrics to spur future research on this broad challenge.

The Power of Summary-Source Alignments

TL;DR

This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level, annotating alignment manually in a multi-document setup, and revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks.

Abstract

Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically on the sentence level on a limited number of subtasks. In this paper, we propose extending the summary-source alignment framework by (1) applying it at the more fine-grained proposition span level, (2) annotating alignment manually in a multi-document setup, and (3) revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks. Specifically, for each of the tasks, we release a manually annotated test set that was derived automatically from the alignment annotation. We also release development and train sets in the same way, but from automatically derived alignments. Using the datasets, each task is demonstrated with baseline models and corresponding evaluation metrics to spur future research on this broad challenge.
Paper Structure (50 sections, 9 figures, 5 tables)

This paper contains 50 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An example of proposition-level multi-document-based alignment. Aligned propositions are in the same color and formatting.
  • Figure 2: Deriving SPARK task datasets from our alignments, for a given document set (topic): (a) Alignments - aligned summary-source propositions are marked here by the same color; (b) Salience Detection - all aligned document propositions are to be selected; (c) Proposition Clustering - document propositions aligned with the same summary proposition are to be clustered; (d) Evidence Detection - a summary proposition is the input query, and the document propositions aligned with it are to be extracted as evidence; (e) Text Planning - document proposition clusters are to be grouped and ordered according to the summary sentence structure; (f) Sentence Fusion - document propositions aligning to the same summary sentence are to be fused to generate that sentence; (g) In-context Fusion - all document propositions, marked within the documents, are to be fused to generate the full summary.
  • Figure 3: The alignment annotation interface. The annotator marks a span (proposition) in the summary (right) along with all matching spans in the current document (left). To minimize cognitive load, a summary is shown next to a single document at a time, and the procedure is conducted separately for all documents in the document set. Also visual focus is placed on one summary sentence at a time (red rectangle) to orient the process.
  • Figure 4: The manual alignment annotation on topic31 from our data. The documents have been shortened for presentation purposes.
  • Figure 5: An example of a Salience Detection instance derived from the alignments in Figure \ref{['fig:alignment_text_example']}. All aligned document propositions are salient. These highlighted documents can also serve as input to the In-context Passage Fusion task, where the output would be the original reference summary.
  • ...and 4 more figures