Table of Contents
Fetching ...

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

Jean Y. Song, Sangwook Lee, Jisoo Lee, Mina Kim, Juho Kim

TL;DR

ModSandbox tackles the problem of unpredictable false positives and false negatives in rule-based moderation by providing a sandboxed environment to test AutoModerator rules before deployment. It combines four design goals with four core features: a Sandbox Environment, FP/FN Recommendation, FP/FN Collection, and Automated Rule Analysis, all supported by NLP-based semantic embeddings. A user study with ten moderators demonstrates that ModSandbox helps identify problematic posts, enables more sophisticated rule creation, and offers insights for updating rules, with variations in usefulness depending on task complexity and user experience. The work suggests that such tooling can reduce cognitive load, support distributed governance, and extend rule-based moderation to broader, possibly non-expert communities. Overall, ModSandbox provides a practical framework to anticipate, diagnose, and iteratively improve automated moderation rules in real-world communities.

Abstract

Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identified the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual effect of a rule in advance and having difficulty figuring out how the rules should be updated. To address these issues, we built ModSandbox, a novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part of the rule is causing issues. We conducted a user study with online content moderators, finding that ModSandbox can support quickly finding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

TL;DR

ModSandbox tackles the problem of unpredictable false positives and false negatives in rule-based moderation by providing a sandboxed environment to test AutoModerator rules before deployment. It combines four design goals with four core features: a Sandbox Environment, FP/FN Recommendation, FP/FN Collection, and Automated Rule Analysis, all supported by NLP-based semantic embeddings. A user study with ten moderators demonstrates that ModSandbox helps identify problematic posts, enables more sophisticated rule creation, and offers insights for updating rules, with variations in usefulness depending on task complexity and user experience. The work suggests that such tooling can reduce cognitive load, support distributed governance, and extend rule-based moderation to broader, possibly non-expert communities. Overall, ModSandbox provides a practical framework to anticipate, diagnose, and iteratively improve automated moderation rules in real-world communities.

Abstract

Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identified the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual effect of a rule in advance and having difficulty figuring out how the rules should be updated. To address these issues, we built ModSandbox, a novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part of the rule is causing issues. We conducted a user study with online content moderators, finding that ModSandbox can support quickly finding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.
Paper Structure (46 sections, 9 figures, 4 tables)

This paper contains 46 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: A diagram that shows relationship between configuration process, challenges, design goals, and system features.
  • Figure 2: An overview of the four main features of ModSandbox. is a "Sandbox Environment" where a moderator can import all the posts from their community. is a toggle button that rearranges the posts in the sandbox area from the most "Possible misses and false alarms" to the least. It helps moderators to more quickly find possible misses (false negative) and false alarms (false positives). is the "FP/FN Collection" area that helps moderators to collect interesting posts for finding their patterns for further rule updates. is the "Configuration Analysis" panel that helps analyze how the rule affected the posts in the sandbox. It shows the number of filtered posts in "Sandbox Environment" and "Post Collections" (FP/FN Collection) with color bars and highlight the part of those filtered posts in their panels (red boxes in , ) for macro and micro-level support of debugging each configuration.
  • Figure 3: Shows how to use a Sandbox Environment. shows the sandbox right after importing posts from a community. When a user clicks on the "Apply" button after writing their rules in the "AutoMod Configuration" panel, -(a) the background turns blue for the posts filtered by the rules, -(b) "Filtered by AutoMod" gathers them in a separate panel for easy browsing, and -(c) a blue bar graph shows the ratio of filtered posts to imported posts.
  • Figure 4: Example of possible misses (false negatives) and false alarms (false positives) of the configured rules in Task 2 of our main user study. Participants were guided to detect posts about asking whether or how to get CS-relevant jobs without CS-relevant degrees. The more probable posts that are being missed are listed at the top (e.g., similarity 0.565 is larger than 0.558), and the opposite happens for the false alarms (similarity 0.152 is smaller than 0.162). The similarity values are hidden in the actual interface.
  • Figure 5: Show how to use FP/FN Collection. (a) The users can move posts from the Sandbox Environment to one of the Post Collections panels: "Posts that should be filtered (red solid arrow)" and "Posts to avoid being filtered (gray dashed arrow)". (b, c) The green and red bars show the ratio of the filtered ones. (d) The filtered posts by the automated rules are marked blue in the Post Collections panel.
  • ...and 4 more figures