On Crowdsourcing Task Design for Discourse Relation Annotation
Frances Yung, Vera Demberg
TL;DR
This work addresses the challenge of interpreting implicit discourse relations by treating annotation disagreement as informative and comparing two crowdsourcing designs for connective insertion: free-choice versus forced-choice. By re-annotating the DiscoGeM 1.0 English subset under a forced-choice protocol and evaluating against the original free-choice annotations, the authors quantify how task design shapes agreement and label variety using metrics such as Jensen-Shannon divergence, entropy, and the Wawa aggregation method. The findings show that free-choice yields higher inter-annotator agreement on a smaller set of common senses, while forced-choice expands the range of interpretations, including rarer senses, highlighting method bias and its interaction with individual processing abilities. The study contributes a large, re-annotated resource and offers guidance for selecting annotation designs aligned with goals like consensus versus diversity, with implications for cross-lingual annotation and IDR model training. The dataset is freely downloadable, enabling further exploration of perspectivism in annotation and its impact on discourse relation recognition models.
Abstract
Interpreting implicit discourse relations involves complex reasoning, requiring the integration of semantic cues with background knowledge, as overt connectives like because or then are absent. These relations often allow multiple interpretations, best represented as distributions. In this study, we compare two established methods that crowdsource English implicit discourse relation annotation by connective insertion: a free-choice approach, which allows annotators to select any suitable connective, and a forced-choice approach, which asks them to select among a set of predefined options. Specifically, we re-annotate the whole DiscoGeM 1.0 corpus -- initially annotated with the free-choice method -- using the forced-choice approach. The free-choice approach allows for flexible and intuitive insertion of various connectives, which are context-dependent. Comparison among over 130,000 annotations, however, shows that the free-choice strategy produces less diverse annotations, often converging on common labels. Analysis of the results reveals the interplay between task design and the annotators' abilities to interpret and produce discourse relations.
