Table of Contents
Fetching ...

Cost-Efficient Subjective Task Annotation and Modeling through Few-Shot Annotator Adaptation

Preni Golazizian, Alireza S. Ziabari, Ali Omrani, Morteza Dehghani

TL;DR

A novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator is introduced, and results in more equitable models, reducing the performance disparity among annotators.

Abstract

In subjective NLP tasks, where a single ground truth does not exist, the inclusion of diverse annotators becomes crucial as their unique perspectives significantly influence the annotations. In realistic scenarios, the annotation budget often becomes the main determinant of the number of perspectives (i.e., annotators) included in the data and subsequent modeling. We introduce a novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator. Our framework has a two-stage design: first, we rely on a small set of annotators to build a multitask model, and second, we augment the model for a new perspective by strategically annotating a few samples per annotator. To test our framework at scale, we introduce and release a unique dataset, Moral Foundations Subjective Corpus, of 2000 Reddit posts annotated by 24 annotators for moral sentiment. We demonstrate that our framework surpasses the previous SOTA in capturing the annotators' individual perspectives with as little as 25% of the original annotation budget on two datasets. Furthermore, our framework results in more equitable models, reducing the performance disparity among annotators.

Cost-Efficient Subjective Task Annotation and Modeling through Few-Shot Annotator Adaptation

TL;DR

A novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator is introduced, and results in more equitable models, reducing the performance disparity among annotators.

Abstract

In subjective NLP tasks, where a single ground truth does not exist, the inclusion of diverse annotators becomes crucial as their unique perspectives significantly influence the annotations. In realistic scenarios, the annotation budget often becomes the main determinant of the number of perspectives (i.e., annotators) included in the data and subsequent modeling. We introduce a novel framework for annotation collection and modeling in subjective tasks that aims to minimize the annotation budget while maximizing the predictive performance for each annotator. Our framework has a two-stage design: first, we rely on a small set of annotators to build a multitask model, and second, we augment the model for a new perspective by strategically annotating a few samples per annotator. To test our framework at scale, we introduce and release a unique dataset, Moral Foundations Subjective Corpus, of 2000 Reddit posts annotated by 24 annotators for moral sentiment. We demonstrate that our framework surpasses the previous SOTA in capturing the annotators' individual perspectives with as little as 25% of the original annotation budget on two datasets. Furthermore, our framework results in more equitable models, reducing the performance disparity among annotators.
Paper Structure (24 sections, 2 equations, 10 figures, 11 tables)

This paper contains 24 sections, 2 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Left: The baseline approach for annotator-level modeling, in full and reduced budget scenarios. Right: Our two-stage proposed framework, designed to achieve the outlined objectives
  • Figure 2: Overall $F_1$ score ($F_1^{overall}$) of our framework compared to the baseline across all three base models on both datasets. We observe a 3.8% performance gain with only 50% of the annotation budget on Brexit dataset, and 2.24% gain with 25% of the annotation budget on MFSC dataset, on the best performing base models.
  • Figure 3: Few-shot $F_1$ score ($F_1^{fs}$) of our framework compared to the baseline across all three base models on both datasets. We observe a 8.47% performance gain with 83% of the annotation budget on Brexit dataset, and 4.37% gain with 25% of the annotation budget on MFSC dataset, on the best performing base models.
  • Figure 4: Comparison of Annotator level $F_1$ scores ($F_1^{a_i}$) on the Brexit dataset between MTL model and our framework, leveraging the $\mathcal{S}_{bal}$ sampling method for all budgets and shots on RoBERTa-base model
  • Figure 5: The abbreviations in the pie chart for race W stands for White or European American, B stands for Black or African American, H stands for Hispanic or Latino/Latinx, P stands for Native Hawaiian or Pacific Islander, A stands for Asian or Asian American, M stands for Middle Eastern or North African.
  • ...and 5 more figures