ARTICLE: Annotator Reliability Through In-Context Learning

Sujan Dutta; Deepak Pandita; Tharindu Cyril Weerasooriya; Marcos Zampieri; Christopher M. Homan; Ashiqur R. KhudaBukhsh

ARTICLE: Annotator Reliability Through In-Context Learning

Sujan Dutta, Deepak Pandita, Tharindu Cyril Weerasooriya, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh

TL;DR

This work proposes ARTICLE, an in-context learning (ICL) framework to estimate annotation quality through self-consistency and indicates that ARTICLE can be used as a robust method for identifying reliable annotators, hence improving data quality.

Abstract

Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annotators. With the goal of increasing diverse perspectives in annotation while ensuring consistency, we propose \texttt{ARTICLE}, an in-context learning (ICL) framework to estimate annotation quality through self-consistency. We evaluate this framework on two offensive speech datasets using multiple LLMs and compare its performance with traditional methods. Our findings indicate that \texttt{ARTICLE} can be used as a robust method for identifying reliable annotators, hence improving data quality.

ARTICLE: Annotator Reliability Through In-Context Learning

TL;DR

Abstract

Paper Structure (21 sections, 7 figures, 5 tables)

This paper contains 21 sections, 7 figures, 5 tables.

Introduction
Contributions.
Related Work
Methodology
Step 1: Identifying Inconsistent Annotators
Step 2: Modeling Group-level Perception
Experimental Setup
Datasets
Models
Computing Environment
Inconsistent Annotation Examples
Evaluation
Modeling Performance
Data Loss
Comparison with CT
...and 6 more sections

Figures (7)

Figure 1: Schematic Diagram of ARTICLE.
Figure 2: Group-level model performance at different $k$ values in $\mathcal{D}_\texttt{TR}$. The error bars indicate 95% confidence interval.
Figure 3: Group-level model performance at different $k$ values in $\mathcal{D}_\texttt{VOICED}$. The error bars indicate 95% confidence interval.
Figure 4: Prompt designed for ARTICLE.
Figure 5: Percentage of annotators and comments remaining at various value of $k$ in $\mathcal{D}_\texttt{TR}$ and $\mathcal{D}_\texttt{VOICED}$.
...and 2 more figures

ARTICLE: Annotator Reliability Through In-Context Learning

TL;DR

Abstract

ARTICLE: Annotator Reliability Through In-Context Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)