Table of Contents
Fetching ...

Annotator-Centric Active Learning for Subjective NLP Tasks

Michiel van der Meer, Neele Falk, Pradeep K. Murukannaiah, Enrico Liscio

TL;DR

Annotator-Centric Active Learning (ACAL) is introduced, which incorporates an annotator selection strategy following data sampling to efficiently approximate the full diversity of human judgments and to assess model performance using annotator-centric metrics, which value minority and majority perspectives equally.

Abstract

Active Learning (AL) addresses the high costs of collecting human annotations by strategically annotating the most informative samples. However, for subjective NLP tasks, incorporating a wide range of perspectives in the annotation process is crucial to capture the variability in human judgments. We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. Our objective is two-fold: 1) to efficiently approximate the full diversity of human judgments, and 2) to assess model performance using annotator-centric metrics, which value minority and majority perspectives equally. We experiment with multiple annotator selection strategies across seven subjective NLP tasks, employing both traditional and novel, human-centered evaluation metrics. Our findings indicate that ACAL improves data efficiency and excels in annotator-centric performance evaluations. However, its success depends on the availability of a sufficiently large and diverse pool of annotators to sample from.

Annotator-Centric Active Learning for Subjective NLP Tasks

TL;DR

Annotator-Centric Active Learning (ACAL) is introduced, which incorporates an annotator selection strategy following data sampling to efficiently approximate the full diversity of human judgments and to assess model performance using annotator-centric metrics, which value minority and majority perspectives equally.

Abstract

Active Learning (AL) addresses the high costs of collecting human annotations by strategically annotating the most informative samples. However, for subjective NLP tasks, incorporating a wide range of perspectives in the annotation process is crucial to capture the variability in human judgments. We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. Our objective is two-fold: 1) to efficiently approximate the full diversity of human judgments, and 2) to assess model performance using annotator-centric metrics, which value minority and majority perspectives equally. We experiment with multiple annotator selection strategies across seven subjective NLP tasks, employing both traditional and novel, human-centered evaluation metrics. Our findings indicate that ACAL improves data efficiency and excels in annotator-centric performance evaluations. However, its success depends on the availability of a sufficiently large and diverse pool of annotators to sample from.
Paper Structure (32 sections, 1 equation, 13 figures, 6 tables, 2 algorithms)

This paper contains 32 sections, 1 equation, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: Active Learning (AL) approaches (left) use a sample selection strategy to pick samples to be annotated by an oracle. The Annotator-Centric Active Learning (ACAL) approach (right) extends AL by introducing an annotator selection strategy to choose the annotators who annotate the selected samples.
  • Figure 2: Learning curves showing model performance on the validation set. On DICES (upper), ACAL approaches are quicker than AL in obtaining similar performance to passive learning. On MHS (lower), ACAL surpasses passive learning in $F_1$ when data has high disagreement.
  • Figure 3: Selected plots showing the $F_1^a$ and $JS^w$ performance on the validation set during the ACAL and AL iterations for DICES, MFTC (care), and MHS (dehumanize). Higher $F_1^a$ is better, lower $JS^w$ is better. Y-axes are scaled to highlight the relative performance to PL.
  • Figure 4: Proportion of data samples that result in higher or lower entropy than the target label distribution per ACAL strategy.
  • Figure 5: Comparison of ACAL, AL, and PL across different MFTC and MHS tasks. Higher $F_1^a$ is better, and lower $JS^w$ is better.
  • ...and 8 more figures