Table of Contents
Fetching ...

Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree

Harbani Jaggi, Kashyap Murali, Eve Fleisig, Erdem Bıyık

TL;DR

Three approaches to predict individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information are introduced: a neural collaborative filtering approach, an in-context learning (ICL) approach, and an intermediate embedding-based architecture.

Abstract

When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information: a neural collaborative filtering (NCF) approach, an in-context learning (ICL) approach, and an intermediate embedding-based architecture. We also study the utility of demographic information for rating prediction. NCF showed limited utility; however, integrating annotator history, demographics, and survey information permits both the embedding-based architecture and ICL to substantially improve prediction accuracy, with the embedding-based architecture outperforming the other methods. We also find that, if demographics are predicted from survey information, using these imputed demographics as features performs comparably to using true demographic data. This suggests that demographics may not provide substantial information for modeling ratings beyond what is captured in survey responses. Our findings raise considerations about the relative utility of different types of annotator information and provide new approaches for modeling annotators in subjective NLP tasks.

Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree

TL;DR

Three approaches to predict individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information are introduced: a neural collaborative filtering approach, an in-context learning (ICL) approach, and an intermediate embedding-based architecture.

Abstract

When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information: a neural collaborative filtering (NCF) approach, an in-context learning (ICL) approach, and an intermediate embedding-based architecture. We also study the utility of demographic information for rating prediction. NCF showed limited utility; however, integrating annotator history, demographics, and survey information permits both the embedding-based architecture and ICL to substantially improve prediction accuracy, with the embedding-based architecture outperforming the other methods. We also find that, if demographics are predicted from survey information, using these imputed demographics as features performs comparably to using true demographic data. This suggests that demographics may not provide substantial information for modeling ratings beyond what is captured in survey responses. Our findings raise considerations about the relative utility of different types of annotator information and provide new approaches for modeling annotators in subjective NLP tasks.

Paper Structure

This paper contains 11 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Design of our neural collaborative filtering (NCF) architecture. Annotator information and the text being rated were passed into an embedding model, then concatenated with the annotator embedding, and passed through a series of dense layers to predict the rating.
  • Figure 2: Design of our embedding-based architecture.
  • Figure 3: Comparison of MAE improvement with varying amounts of annotator input across all models. The text-embedding-3-large model consistently outperforms all other models and has most improvement on its own baseline.
  • Figure 4: Sample prompt for toxicity prediction model. The system prompt (in teal) defines the model's role. The user prompt (in olive) provides historical annotations, survey results, demographic information, and the text to be rated.