Table of Contents
Fetching ...

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

Xinmeng Hou

TL;DR

A prescriptive annotation benchmark grounded in humanities research is introduced to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses.

Abstract

This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes offensive language for academic purposes. Discretion is advised.

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

TL;DR

A prescriptive annotation benchmark grounded in humanities research is introduced to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses.

Abstract

This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes offensive language for academic purposes. Discretion is advised.

Paper Structure

This paper contains 26 sections, 1 equation, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Research Design: This research establishes standardized criteria for toxic language annotation and analyzes inter-annotator reliability. Experiments on BERT models across language types tend to demonstrate the broader applicability of the proposed annotation criteria, even with limited resources.
  • Figure 2: Confusion Matrix on Direction Intent Annotation
  • Figure 3: Confusion Matrix on Aggression Annotation
  • Figure 4: Confusion Matrix on Toxicity Annotation with Criteria
  • Figure 5: Confusion Matrix on Toxicity Annotation without Criteria
  • ...and 4 more figures