Table of Contents
Fetching ...

A Critical Reflection on the Use of Toxicity Detection Algorithms in Proactive Content Moderation Systems

Mark Warner, Angelika Strohmayer, Matthew Higgs, Lynne Coventry

TL;DR

This paper critically examines embedding toxicity-detection algorithms into proactive content moderation, highlighting how context, power dynamics, and user diversity shape effectiveness and risk. It uses four design workshops (N=18) with diverse stakeholders to explore a hypothetical mobile keyboard that nudges users away from sending toxic content, analyzed via collaborative thematic methods. The study identifies contextual factors, an end-user intention continuum, and potential abuse vectors (validation, gamification, manipulation, circumvention) that together constrain design choices and call for harm-reduction, context-aware, and interdisciplinary design. Practically, it proposes design considerations—output presentation, user feedback, and cross-disciplinary collaboration—to mitigate risks while enhancing transparent, responsible proactive moderation.

Abstract

Toxicity detection algorithms, originally designed with reactive content moderation in mind, are increasingly being deployed into proactive end-user interventions to moderate content. Through a socio-technical lens and focusing on contexts in which they are applied, we explore the use of these algorithms in proactive moderation systems. Placing a toxicity detection algorithm in an imagined virtual mobile keyboard, we critically explore how such algorithms could be used to proactively reduce the sending of toxic content. We present findings from design workshops conducted with four distinct stakeholder groups and find concerns around how contextual complexities may exasperate inequalities around content moderation processes. Whilst only specific user groups are likely to directly benefit from these interventions, we highlight the potential for other groups to misuse them to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exasperate harm.

A Critical Reflection on the Use of Toxicity Detection Algorithms in Proactive Content Moderation Systems

TL;DR

This paper critically examines embedding toxicity-detection algorithms into proactive content moderation, highlighting how context, power dynamics, and user diversity shape effectiveness and risk. It uses four design workshops (N=18) with diverse stakeholders to explore a hypothetical mobile keyboard that nudges users away from sending toxic content, analyzed via collaborative thematic methods. The study identifies contextual factors, an end-user intention continuum, and potential abuse vectors (validation, gamification, manipulation, circumvention) that together constrain design choices and call for harm-reduction, context-aware, and interdisciplinary design. Practically, it proposes design considerations—output presentation, user feedback, and cross-disciplinary collaboration—to mitigate risks while enhancing transparent, responsible proactive moderation.

Abstract

Toxicity detection algorithms, originally designed with reactive content moderation in mind, are increasingly being deployed into proactive end-user interventions to moderate content. Through a socio-technical lens and focusing on contexts in which they are applied, we explore the use of these algorithms in proactive moderation systems. Placing a toxicity detection algorithm in an imagined virtual mobile keyboard, we critically explore how such algorithms could be used to proactively reduce the sending of toxic content. We present findings from design workshops conducted with four distinct stakeholder groups and find concerns around how contextual complexities may exasperate inequalities around content moderation processes. Whilst only specific user groups are likely to directly benefit from these interventions, we highlight the potential for other groups to misuse them to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exasperate harm.
Paper Structure (36 sections, 3 figures, 1 table)

This paper contains 36 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Twitter's proactive moderation prompt (cropped) katsaros2021reconsidering
  • Figure 2: Part 1 and 2 workshop slides. Slide A supported a situation card sorting exercise; slide B supported an activity to elicit feedback on the benefits and limitations of proactive moderation systems built into a keyboard.
  • Figure 3: ML mode overview and part 3 workshop slides. Slide A provided an overview of the toxicity detection model; slide B supported participants in thinking about and discussing designs of a keyboard based proactive intervention that utilises the model.