Table of Contents
Fetching ...

On Crowdsourcing Task Design for Discourse Relation Annotation

Frances Yung, Vera Demberg

TL;DR

This work addresses the challenge of interpreting implicit discourse relations by treating annotation disagreement as informative and comparing two crowdsourcing designs for connective insertion: free-choice versus forced-choice. By re-annotating the DiscoGeM 1.0 English subset under a forced-choice protocol and evaluating against the original free-choice annotations, the authors quantify how task design shapes agreement and label variety using metrics such as Jensen-Shannon divergence, entropy, and the Wawa aggregation method. The findings show that free-choice yields higher inter-annotator agreement on a smaller set of common senses, while forced-choice expands the range of interpretations, including rarer senses, highlighting method bias and its interaction with individual processing abilities. The study contributes a large, re-annotated resource and offers guidance for selecting annotation designs aligned with goals like consensus versus diversity, with implications for cross-lingual annotation and IDR model training. The dataset is freely downloadable, enabling further exploration of perspectivism in annotation and its impact on discourse relation recognition models.

Abstract

Interpreting implicit discourse relations involves complex reasoning, requiring the integration of semantic cues with background knowledge, as overt connectives like because or then are absent. These relations often allow multiple interpretations, best represented as distributions. In this study, we compare two established methods that crowdsource English implicit discourse relation annotation by connective insertion: a free-choice approach, which allows annotators to select any suitable connective, and a forced-choice approach, which asks them to select among a set of predefined options. Specifically, we re-annotate the whole DiscoGeM 1.0 corpus -- initially annotated with the free-choice method -- using the forced-choice approach. The free-choice approach allows for flexible and intuitive insertion of various connectives, which are context-dependent. Comparison among over 130,000 annotations, however, shows that the free-choice strategy produces less diverse annotations, often converging on common labels. Analysis of the results reveals the interplay between task design and the annotators' abilities to interpret and produce discourse relations.

On Crowdsourcing Task Design for Discourse Relation Annotation

TL;DR

This work addresses the challenge of interpreting implicit discourse relations by treating annotation disagreement as informative and comparing two crowdsourcing designs for connective insertion: free-choice versus forced-choice. By re-annotating the DiscoGeM 1.0 English subset under a forced-choice protocol and evaluating against the original free-choice annotations, the authors quantify how task design shapes agreement and label variety using metrics such as Jensen-Shannon divergence, entropy, and the Wawa aggregation method. The findings show that free-choice yields higher inter-annotator agreement on a smaller set of common senses, while forced-choice expands the range of interpretations, including rarer senses, highlighting method bias and its interaction with individual processing abilities. The study contributes a large, re-annotated resource and offers guidance for selecting annotation designs aligned with goals like consensus versus diversity, with implications for cross-lingual annotation and IDR model training. The dataset is freely downloadable, enabling further exploration of perspectivism in annotation and its impact on discourse relation recognition models.

Abstract

Interpreting implicit discourse relations involves complex reasoning, requiring the integration of semantic cues with background knowledge, as overt connectives like because or then are absent. These relations often allow multiple interpretations, best represented as distributions. In this study, we compare two established methods that crowdsource English implicit discourse relation annotation by connective insertion: a free-choice approach, which allows annotators to select any suitable connective, and a forced-choice approach, which asks them to select among a set of predefined options. Specifically, we re-annotate the whole DiscoGeM 1.0 corpus -- initially annotated with the free-choice method -- using the forced-choice approach. The free-choice approach allows for flexible and intuitive insertion of various connectives, which are context-dependent. Comparison among over 130,000 annotations, however, shows that the free-choice strategy produces less diverse annotations, often converging on common labels. Analysis of the results reveals the interplay between task design and the annotators' abilities to interpret and produce discourse relations.

Paper Structure

This paper contains 6 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Distribution of the unaggregated annotations
  • Figure 2: Confusion matrix of the aggregated annotations from both methods, with labels merged at level-2 granularity
  • Figure 3: Total number of unique relations annotated by the same workers on the same set of items
  • Figure 4: Examples taken from DiscoGeM where the annotations by the forced- and free- choice approaches are alternative interpretations. The English forced-choice annotations come from the current study and those from the other languages come from DiscoGeM 2.0. The English free-choice annotations come from DiscoGeM 1.0.
  • Figure 5: Comparison between $3233$ annotations by the same workers on the same items using both methods