Table of Contents
Fetching ...

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation

Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke

TL;DR

This work reframes content moderation as a rule-aware question-answering task. It introduces ModQ, with two variants—ModQ-Extract (extractive) and ModQ-Select (multiple-choice)—that reason over the full, community-specific rule sets to identify the exact rule violated by a comment. Across Reddit and Lemmy, ModQ variants outperform state-of-the-art baselines and demonstrate robust generalization to unseen communities and rules, while maintaining interpretability and lower computational demands than generation-based approaches. The approach offers practical moderation tools (flagging with rationale, user nudges) and governance insights by treating rules as a structured knowledge base that can be queried to simulate policy changes and audit enforcement alignment.

Abstract

Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation

TL;DR

This work reframes content moderation as a rule-aware question-answering task. It introduces ModQ, with two variants—ModQ-Extract (extractive) and ModQ-Select (multiple-choice)—that reason over the full, community-specific rule sets to identify the exact rule violated by a comment. Across Reddit and Lemmy, ModQ variants outperform state-of-the-art baselines and demonstrate robust generalization to unseen communities and rules, while maintaining interpretability and lower computational demands than generation-based approaches. The approach offers practical moderation tools (flagging with rationale, user nudges) and governance insights by treating rules as a structured knowledge base that can be queried to simulate policy changes and audit enforcement alignment.

Abstract

Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.

Paper Structure

This paper contains 42 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: ModQ-Extract and ModQ-Select models presented in this paper for modeling content moderation as a question and answer task
  • Figure 2: Lemmy data preparation process: Figure illustrating various stages in the data preparation phase for Lemmy modlogs. (a) displays a typical Lemmy modlog queried from Lemmy's public API. We use GPT-O4 mini to extract structured community rules from the community description (b) and match the removal reason provided by moderators with one of the extracted rules (c). We then prepare Q&A data for the bert model in the form of question, context and start and end of the answer as displayed in (d).
  • Figure 3: Lemmy data leave-N-communities-out: Figure illustrating macro F1 results for leave-N-communities-out test set.
  • Figure 4: Lemmy data leave-N-rules-out: Figure illustrating macro F1 results for leave-N-rules-out test set.
  • Figure 5: Confusion matrix produced by ModQ-Select between true (y axis) and predicted (x axis) labels