Table of Contents
Fetching ...

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks

Eve Fleisig, Rediet Abebe, Dan Klein

TL;DR

A model is constructed that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members, finding that annotators' ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs.

Abstract

Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus, a crucial problem in hate speech detection is determining whether a statement is offensive to the demographic group that it targets, when that group may constitute a small fraction of the annotator pool. We construct a model that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members. We show gains across a range of metrics, including raising performance over the baseline by 22% at predicting individual annotators' ratings and by 33% at predicting variance among annotators, which provides a metric for model uncertainty downstream. We find that annotator ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs that link each annotator to their ratings. We also find that use of non-invasive survey questions on annotators' online experiences helps to maximize privacy and minimize unnecessary collection of demographic information when predicting annotators' opinions.

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks

TL;DR

A model is constructed that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members, finding that annotators' ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs.

Abstract

Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus, a crucial problem in hate speech detection is determining whether a statement is offensive to the demographic group that it targets, when that group may constitute a small fraction of the annotator pool. We construct a model that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members. We show gains across a range of metrics, including raising performance over the baseline by 22% at predicting individual annotators' ratings and by 33% at predicting variance among annotators, which provides a metric for model uncertainty downstream. We find that annotator ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs that link each annotator to their ratings. We also find that use of non-invasive survey questions on annotators' online experiences helps to maximize privacy and minimize unnecessary collection of demographic information when predicting annotators' opinions.
Paper Structure (14 sections, 2 equations, 2 figures, 6 tables)

This paper contains 14 sections, 2 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Majority vote aggregation obscures disagreement among annotators due to their lived experiences and other factors. Modeling individual annotator opinions helps to determine when the group targeted by a possibly-hateful statement disagrees with the majority on whether the statement is harmful.
  • Figure 2: Structure of our approach. Given a piece of text and the annotator's demographic information and survey responses, the rating prediction module predicts the rating given by each annotator who labeled a piece of text (red). The target group prediction module predicts the demographic group(s) harmed by the input text (blue). At test time, our model predicts the target group for the input text, then predicts the rating that the members of the target group would give to that text (purple).