Table of Contents
Fetching ...

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Björn Ross, Mirella Lapata, Francesco Barbieri

TL;DR

This study asks whether explanations can speed professional hate-speech moderators. By comparing post-only, generic rule-based explanations, and structured, post-specific explanations using goldParse-tree annotations from PLEAD, the authors show that structured explanations cut per-post decision time by 1.34 seconds (about 7.4%) without reducing accuracy, while generic explanations yield no speed benefit. A follow-up moderator survey reveals a strong preference for structured explanations. These findings suggest that deploying structured explanations in moderation tools can meaningfully boost throughput on large platforms, guiding future development of explainable abuse-detection systems.

Abstract

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

TL;DR

This study asks whether explanations can speed professional hate-speech moderators. By comparing post-only, generic rule-based explanations, and structured, post-specific explanations using goldParse-tree annotations from PLEAD, the authors show that structured explanations cut per-post decision time by 1.34 seconds (about 7.4%) without reducing accuracy, while generic explanations yield no speed benefit. A follow-up moderator survey reveals a strong preference for structured explanations. These findings suggest that deploying structured explanations in moderation tools can meaningfully boost throughput on large platforms, guiding future development of explainable abuse-detection systems.

Abstract

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
Paper Structure (21 sections, 2 equations, 14 figures, 1 table)

This paper contains 21 sections, 2 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Annotation interface for setting 2 (post+label), where moderators are shown a post and a description of the rule it is deemed to violate. We intentionally chose a generic policy paragraph for this example as we are not allowed to share the content of the internal policies.
  • Figure 2: Annotation interface for setting 3 (post+tags), where moderators are shown the post with tagged spans as in 2022-calabrese-plead.
  • Figure 3: Effect of generic and structured explanations on the speed of each moderator (No change: $|z| < 2$).
  • Figure 4: Effect of generic explanations on the speed of individual moderators, grouped depending on which round they were shown this setting (No change: $|z| < 2$).
  • Figure 5: Effect of structured explanations on the speed of individual moderators, grouped depending on which round they were shown this setting (No change: $|z| < 2$).
  • ...and 9 more figures