Table of Contents
Fetching ...

Quantifying and Attributing Polarization to Annotator Groups

Dimitris Tsirmpas, John Pavlopoulos

TL;DR

This paper introduces apunim, a quantitative metric that attributes polarization in annotations to specific annotator subgroups, addressing the shortcomings of traditional agreement measures in imbalanced and multi-label settings. It combines apriori polarization with aposteriori attribution and provides a statistically rigorous p-value test to assess subgroup significance, enabling robust cross-dataset comparisons. Applying the method to toxicity and hate-speech datasets reveals that annotator race/ethnicity is a strong driver of polarization, with education and religiosity shaping intra-group and inter-group disagreement patterns. The authors provide an open-source library, discuss minimum annotator requirements, and highlight practical implications for annotation design and fairness in NLP tasks.

Abstract

Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions render them insufficient for many subjective tasks such as toxicity and hate-speech detection. For this reason, we introduce a quantifiable metric, paired with a statistical significance test, that attributes polarization to various annotator groups. Our metric enables direct comparisons between heavily imbalanced sociodemographic and ideological subgroups across different datasets and tasks, while also enabling analysis on multi-label settings. We apply this metric to three datasets on hate speech, and one on toxicity detection, discovering that: (1) Polarization is strongly and persistently attributed to annotator race, especially on the hate speech task. (2) Religious annotators do not fundamentally disagree with each other, but do with other annotators, a trend that is gradually diminished and then reversed with irreligious annotators. (3) Less educated annotators are more subjective, while educated ones tend to broadly agree more between themselves. Overall, our results reflect current findings around annotation patterns for various subgroups. Finally, we estimate the minimum number of annotators needed to obtain robust results, and provide an open-source Python library that implements our metric.

Quantifying and Attributing Polarization to Annotator Groups

TL;DR

This paper introduces apunim, a quantitative metric that attributes polarization in annotations to specific annotator subgroups, addressing the shortcomings of traditional agreement measures in imbalanced and multi-label settings. It combines apriori polarization with aposteriori attribution and provides a statistically rigorous p-value test to assess subgroup significance, enabling robust cross-dataset comparisons. Applying the method to toxicity and hate-speech datasets reveals that annotator race/ethnicity is a strong driver of polarization, with education and religiosity shaping intra-group and inter-group disagreement patterns. The authors provide an open-source library, discuss minimum annotator requirements, and highlight practical implications for annotation design and fairness in NLP tasks.

Abstract

Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions render them insufficient for many subjective tasks such as toxicity and hate-speech detection. For this reason, we introduce a quantifiable metric, paired with a statistical significance test, that attributes polarization to various annotator groups. Our metric enables direct comparisons between heavily imbalanced sociodemographic and ideological subgroups across different datasets and tasks, while also enabling analysis on multi-label settings. We apply this metric to three datasets on hate speech, and one on toxicity detection, discovering that: (1) Polarization is strongly and persistently attributed to annotator race, especially on the hate speech task. (2) Religious annotators do not fundamentally disagree with each other, but do with other annotators, a trend that is gradually diminished and then reversed with irreligious annotators. (3) Less educated annotators are more subjective, while educated ones tend to broadly agree more between themselves. Overall, our results reflect current findings around annotation patterns for various subgroups. Finally, we estimate the minimum number of annotators needed to obtain robust results, and provide an open-source Python library that implements our metric.
Paper Structure (40 sections, 8 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 8 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The apunim framework attributes annotator polarization to various annotator subgroups.
  • Figure 2: Left: Disagreement is similar to variance, overlooking cases where a minority group (green) detects something the majority group (blue) may miss. Polarization on the other hand, identifies multiple clusters even if they are close together. Right: Example of a polarizing comment, where male and female annotators agree between themselves, but disagree with the opposite gender.
  • Figure 3: Example of a polarizing discussion with two comments where annotators disagree about which one is toxic. When we aggregate the comments, we unintentionally change the comparison: instead of comparing two unimodal distributions (each comment viewed separately) with two bimodal distributions (their aggregated annotations), we end up comparing two bimodal distributions, obscuring the source of the polarization.
  • Figure 4: An overview of how annotation information is sequentially aggregated for the apunim metric. For each item we calculate the apriori polarization and the sample aposteriori polarization given each pc group. We then aggregate both apriori and aposteriori polarization and compare them on the dataset level to obtain the polarization attribution ("apunim") values for each group.
  • Figure 7: Apunim values across ordinal factor levels with at least two statistically significant values. The x-axis corresponds to the ordered levels of each factor and is normalized so that different ordinal scales are comparable (e.g., education which is measured in different scales across datasets). The left side of the x-axis (low orders) refers to young annotators, low education, low religiousness, and indifference towards the impacts of toxicity and technology. The full ordinal modes for each dimension and their individual labels can be found in \ref{['app:full']}.
  • ...and 4 more figures