ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

Jean Y. Song; Sangwook Lee; Jisoo Lee; Mina Kim; Juho Kim

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

Jean Y. Song, Sangwook Lee, Jisoo Lee, Mina Kim, Juho Kim

TL;DR

ModSandbox tackles the problem of unpredictable false positives and false negatives in rule-based moderation by providing a sandboxed environment to test AutoModerator rules before deployment. It combines four design goals with four core features: a Sandbox Environment, FP/FN Recommendation, FP/FN Collection, and Automated Rule Analysis, all supported by NLP-based semantic embeddings. A user study with ten moderators demonstrates that ModSandbox helps identify problematic posts, enables more sophisticated rule creation, and offers insights for updating rules, with variations in usefulness depending on task complexity and user experience. The work suggests that such tooling can reduce cognitive load, support distributed governance, and extend rule-based moderation to broader, possibly non-expert communities. Overall, ModSandbox provides a practical framework to anticipate, diagnose, and iteratively improve automated moderation rules in real-world communities.

Abstract

Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identified the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual effect of a rule in advance and having difficulty figuring out how the rules should be updated. To address these issues, we built ModSandbox, a novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part of the rule is causing issues. We conducted a user study with online content moderators, finding that ModSandbox can support quickly finding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

TL;DR

Abstract

Paper Structure (46 sections, 9 figures, 4 tables)

This paper contains 46 sections, 9 figures, 4 tables.

Introduction
Related Work
Automated Content Moderation on Social Platforms
Designing a System for Content Moderation
Background: Reddit AutoModerator
Interview: Challenges Encountered During Configuration Process
C1. No way to estimate the actual effects of a rule in advance
C2. Hard to detect false positives of the rule even after its deployment
C3. Hard to figure out how the rule should be updated
C4. Hard to understand which part of the rule is problematic
ModSandbox: System Design
Design Goals
DG1. Provide a sandbox to enable prompt configuration evaluation without affecting posts in real communities.
DG2. Provide methods for quickly discovering false positives and false negatives.
DG3. Provide a space to collect and leverage posts to identify recurring patterns.
...and 31 more sections

Figures (9)

Figure 1: A diagram that shows relationship between configuration process, challenges, design goals, and system features.
Figure 2: An overview of the four main features of ModSandbox. is a "Sandbox Environment" where a moderator can import all the posts from their community. is a toggle button that rearranges the posts in the sandbox area from the most "Possible misses and false alarms" to the least. It helps moderators to more quickly find possible misses (false negative) and false alarms (false positives). is the "FP/FN Collection" area that helps moderators to collect interesting posts for finding their patterns for further rule updates. is the "Configuration Analysis" panel that helps analyze how the rule affected the posts in the sandbox. It shows the number of filtered posts in "Sandbox Environment" and "Post Collections" (FP/FN Collection) with color bars and highlight the part of those filtered posts in their panels (red boxes in , ) for macro and micro-level support of debugging each configuration.
Figure 3: Shows how to use a Sandbox Environment. shows the sandbox right after importing posts from a community. When a user clicks on the "Apply" button after writing their rules in the "AutoMod Configuration" panel, -(a) the background turns blue for the posts filtered by the rules, -(b) "Filtered by AutoMod" gathers them in a separate panel for easy browsing, and -(c) a blue bar graph shows the ratio of filtered posts to imported posts.
Figure 4: Example of possible misses (false negatives) and false alarms (false positives) of the configured rules in Task 2 of our main user study. Participants were guided to detect posts about asking whether or how to get CS-relevant jobs without CS-relevant degrees. The more probable posts that are being missed are listed at the top (e.g., similarity 0.565 is larger than 0.558), and the opposite happens for the false alarms (similarity 0.152 is smaller than 0.162). The similarity values are hidden in the actual interface.
Figure 5: Show how to use FP/FN Collection. (a) The users can move posts from the Sandbox Environment to one of the Post Collections panels: "Posts that should be filtered (red solid arrow)" and "Posts to avoid being filtered (gray dashed arrow)". (b, c) The green and red bars show the ratio of the filtered ones. (d) The filtered posts by the automated rules are marked blue in the Post Collections panel.
...and 4 more figures

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

TL;DR

Abstract

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

Authors

TL;DR

Abstract

Table of Contents

Figures (9)