Table of Contents
Fetching ...

Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation

Amir Zeldes, Katherine Conhaim, Lauren Levine

Abstract

Despite a long tradition of work on extractive summarization, which by nature aims to recover the most important propositions in a text, little work has been done on operationalizing graded proposition salience in naturally occurring data. In this paper, we adopt graded summarization-based salience as a metric from previous work on Salient Entity Extraction (SEE) and adapt it to quantify proposition salience. We define the annotation task, apply it to a small multi-genre dataset, evaluate agreement and carry out a preliminary study of the relationship between our metric and notions of discourse unit centrality in discourse parsing following Rhetorical Structure Theory (RST).

Not Worth Mentioning? A Pilot Study on Salient Proposition Annotation

Abstract

Despite a long tradition of work on extractive summarization, which by nature aims to recover the most important propositions in a text, little work has been done on operationalizing graded proposition salience in naturally occurring data. In this paper, we adopt graded summarization-based salience as a metric from previous work on Salient Entity Extraction (SEE) and adapt it to quantify proposition salience. We define the annotation task, apply it to a small multi-genre dataset, evaluate agreement and carry out a preliminary study of the relationship between our metric and notions of discourse unit centrality in discourse parsing following Rhetorical Structure Theory (RST).

Paper Structure

This paper contains 13 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Annotation interface for summary-wise salient proposition alignment.
  • Figure 2: Distribution of salience scores.
  • Figure 3: RST centrality example. Graph distance from the root is indicated in blue, e.g. $[$d=1$]$ means one horizontal edge away from the root, which is unit $[$2$]$.
  • Figure 4: Fragment of an RST tree with units shaded by salience: a score of 5=red, 3=orange, 1=yellow.