Table of Contents
Fetching ...

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank

TL;DR

The paper investigates human label variation in named entity recognition by analyzing expert-annotated revisions across English, Danish, and Bavarian datasets. It compiles and aligns multiple annotation versions to quantify where disagreements arise, highlighting text ambiguity and guideline updates as key drivers in English and Danish, and annotator error in Bavarian. A taxonomy of disagreement sources and a small student-survey study reveal that most conflicts occur in label choices (Tag/Missing) rather than span boundaries, underscoring the distributional nature of NER ambiguity. The work provides dataset resources and advocates large-scale, distribution-aware annotation and automatic error detection to better model and evaluate NER under label variation.

Abstract

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

TL;DR

The paper investigates human label variation in named entity recognition by analyzing expert-annotated revisions across English, Danish, and Bavarian datasets. It compiles and aligns multiple annotation versions to quantify where disagreements arise, highlighting text ambiguity and guideline updates as key drivers in English and Danish, and annotator error in Bavarian. A taxonomy of disagreement sources and a small student-survey study reveal that most conflicts occur in label choices (Tag/Missing) rather than span boundaries, underscoring the distributional nature of NER ambiguity. The work provides dataset resources and advocates large-scale, distribution-aware annotation and automatic error detection to better model and evaluate NER under label variation.

Abstract

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.
Paper Structure (17 sections, 3 figures, 3 tables)

This paper contains 17 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Proportions of entity-level disagreements in English original-clean, conllpp-clean, reiss-clean, Danish plank-hvingelby, and Bavarian.
  • Figure 2: Proportions of top 5 label pairs in Tag and Missing disagreements in English, Danish, and Bavarian.
  • Figure 3: Proportions of label pairs (full) in Tag and Missing disagreements in English, Danish, and Bavarian.