Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese; Leonardo Neves; Neil Shah; Maarten W. Bos; Björn Ross; Mirella Lapata; Francesco Barbieri

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Björn Ross, Mirella Lapata, Francesco Barbieri

TL;DR

This study asks whether explanations can speed professional hate-speech moderators. By comparing post-only, generic rule-based explanations, and structured, post-specific explanations using goldParse-tree annotations from PLEAD, the authors show that structured explanations cut per-post decision time by 1.34 seconds (about 7.4%) without reducing accuracy, while generic explanations yield no speed benefit. A follow-up moderator survey reveals a strong preference for structured explanations. These findings suggest that deploying structured explanations in moderation tools can meaningfully boost throughput on large platforms, guiding future development of explainable abuse-detection systems.

Abstract

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 14 figures, 1 table)

This paper contains 21 sections, 2 equations, 14 figures, 1 table.

Introduction
Related Work
Explainable Abuse Detection
Experimental Design
Data
Method
Evaluation Metrics
Do Explanations Help Moderators?
Do Moderators Want Explanations?
Conclusions
Limitations
Ethical Considerations
Experimental Design
PLEAD
Policy Adaptation
...and 6 more sections

Figures (14)

Figure 1: Annotation interface for setting 2 (post+label), where moderators are shown a post and a description of the rule it is deemed to violate. We intentionally chose a generic policy paragraph for this example as we are not allowed to share the content of the internal policies.
Figure 2: Annotation interface for setting 3 (post+tags), where moderators are shown the post with tagged spans as in 2022-calabrese-plead.
Figure 3: Effect of generic and structured explanations on the speed of each moderator (No change: $|z| < 2$).
Figure 4: Effect of generic explanations on the speed of individual moderators, grouped depending on which round they were shown this setting (No change: $|z| < 2$).
Figure 5: Effect of structured explanations on the speed of individual moderators, grouped depending on which round they were shown this setting (No change: $|z| < 2$).
...and 9 more figures

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

TL;DR

Abstract

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Authors

TL;DR

Abstract

Table of Contents

Figures (14)