Table of Contents
Fetching ...

Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia

Nathan TeBlunthuis, Benjamin Mako Hill, Aaron Halfaker

TL;DR

The paper investigates whether algorithmic flagging can alleviate overprofiling and improve fairness in online community moderation, focusing on Wikipedia with the RCFilters system (backed by ORES). It uses a quasi-experimental regression discontinuity design around ORES score thresholds to causally estimate how flags influence moderator sanctions and controversial outcomes across 23 language editions. The findings show that flagging can increase sanctioning and, for some social signals, reduce unfair scrutiny (overprofiling) and controversial sanctions, though effects are heterogeneous and sensitive to context and design choices. The study contributes a methodological template for evaluating algorithmic decision-support tools in real-world sociotechnical systems and offers practical design guidance for moderation interfaces and fairness considerations.

Abstract

Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to "overprofiling'' bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent.

Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia

TL;DR

The paper investigates whether algorithmic flagging can alleviate overprofiling and improve fairness in online community moderation, focusing on Wikipedia with the RCFilters system (backed by ORES). It uses a quasi-experimental regression discontinuity design around ORES score thresholds to causally estimate how flags influence moderator sanctions and controversial outcomes across 23 language editions. The findings show that flagging can increase sanctioning and, for some social signals, reduce unfair scrutiny (overprofiling) and controversial sanctions, though effects are heterogeneous and sensitive to context and design choices. The study contributes a methodological template for evaluating algorithmic decision-support tools in real-world sociotechnical systems and offers practical design guidance for moderation interfaces and fairness considerations.

Abstract

Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to "overprofiling'' bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent.

Paper Structure

This paper contains 26 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Screenshot of Wikipedia edit metadata on Special:RecentChanges with RCFilters enabled. Highlighted edits with a colored circle to the left side of other metadata are flagged by ORES. Different circles and highlight colors (white, yellow, orange and red in the figure) correspond to different levels of confidence that the edit is damaging. Users can configure which colors are shown. Visible social signals include registration status (i.e., whether a username or an IP address is shown) and whether an editor's user page and user talk page exist. RCFilters does not specifically flag edits by new accounts, but does support filtering changes by newcomers.
  • Figure 1: Marginal effects plot showing model predicted relationship between ORES score and the probability that an edit will be reverted around the cutoffs for all contributors with 95% credible intervals.
  • Figure 2: Results for RQ1 comparing unregistered and registered contributors are displayed in a marginal effects plot showing the model predicted relationship with 95% credible intervals between ORES scores and reverts around the thresholds that trigger flags.
  • Figure 3: Results for RQ1 showing point estimates and 95% credible intervals for differences in the causal effect of flagging on sanctioning between overprofiled contributors and others. A value greater than 0 indicates that our estimates of the effect for underprofiled contributors are greater than those for overprofiled contributors.
  • Figure 4: Results for RQ1 comparing contributors with and without user pages. Each panel shows a marginal effects plot with 95% credible intervals of the modeled relationship between ORES scores and reverts around the thresholds that trigger flags.
  • ...and 3 more figures