ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules
Jean Y. Song, Sangwook Lee, Jisoo Lee, Mina Kim, Juho Kim
TL;DR
ModSandbox tackles the problem of unpredictable false positives and false negatives in rule-based moderation by providing a sandboxed environment to test AutoModerator rules before deployment. It combines four design goals with four core features: a Sandbox Environment, FP/FN Recommendation, FP/FN Collection, and Automated Rule Analysis, all supported by NLP-based semantic embeddings. A user study with ten moderators demonstrates that ModSandbox helps identify problematic posts, enables more sophisticated rule creation, and offers insights for updating rules, with variations in usefulness depending on task complexity and user experience. The work suggests that such tooling can reduce cognitive load, support distributed governance, and extend rule-based moderation to broader, possibly non-expert communities. Overall, ModSandbox provides a practical framework to anticipate, diagnose, and iteratively improve automated moderation rules in real-world communities.
Abstract
Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identified the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual effect of a rule in advance and having difficulty figuring out how the rules should be updated. To address these issues, we built ModSandbox, a novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part of the rule is causing issues. We conducted a user study with online content moderators, finding that ModSandbox can support quickly finding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.
