Table of Contents
Fetching ...

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan

TL;DR

MoMoE introduces a modular, cross-community moderation framework that ensembles lightweight, specialized experts through four operators—Allocate, Predict, Aggregate, Explain—for scalable, transparent content governance. It combines seven community-based experts with five norm-violation experts to address community-specific norms while benefiting from cross-community knowledge, achieving Micro-F1 scores up to $0.72$ on unseen subreddits and robust explanations via a post-hoc GPT-4o module. The approach demonstrates that lightweight, explainable expert ensembles can rival fine-tuned baselines without per-community data, while preserving moderator agency through interpretable traces and decision rationales. This work lays groundwork for human-AI collaborative governance in online forums, suggesting directions for adaptive expert selection, real-time deployment, and user-centric evaluation to validate practical impact.

Abstract

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

TL;DR

MoMoE introduces a modular, cross-community moderation framework that ensembles lightweight, specialized experts through four operators—Allocate, Predict, Aggregate, Explain—for scalable, transparent content governance. It combines seven community-based experts with five norm-violation experts to address community-specific norms while benefiting from cross-community knowledge, achieving Micro-F1 scores up to on unseen subreddits and robust explanations via a post-hoc GPT-4o module. The approach demonstrates that lightweight, explainable expert ensembles can rival fine-tuned baselines without per-community data, while preserving moderator agency through interpretable traces and decision rationales. This work lays groundwork for human-AI collaborative governance in online forums, suggesting directions for adaptive expert selection, real-time deployment, and user-centric evaluation to validate practical impact.

Abstract

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.

Paper Structure

This paper contains 42 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: MoMoE is composed of four modular operators--- (1)Allocate: Determines how to pick the relevant experts and weigh the predictions they provide using softmax probabilities from classification models or similarity-based scoring; (2)Predict: Determines individual expert predictions from two kind of ensembles, with community-specific experts or norm-violation experts; (3)Aggregate: Determines how to aggregate the predictions of individual experts into a single outcome using strategies like dot product between allocated weights and expert predictions or majority voting; and (4)Explain: Uses a post hoc LLM-based approach to summarize and explain MoMoE's decision output to help moderators understand outcomes and rectify potential inconsistencies.
  • Figure 2: Performance of MoMoE on Target Subreddits reveals that both MoMoECommunity and MoMoENormVio perform competitively against baselines either matching or outperforming them in terms of Micro-F1 score.
  • Figure 3: Comparing F1 score performance with dot-product based aggregation we observe that while MoMoECommunity provides a wider range of performance across subreddits ($\approx0.45-0.8$), MoMoENormVio gives consistent moderate performance across subreddits ($\approx0.65$). (${}^\ast$$p<0.05$, ${}^{\ast\ast}$$p<0.01$, ${}^{\ast\ast\ast}$$p<0.001$)
  • Figure 4: Comparison of precision-recall trade-offs with Llama-based MoMoECommunity with MoMoENormVio using a dot-product aggregation. We observe that MoMoENormVio has higher recall compared to MoMoECommunity (mean difference $\approx0.06$), whereas in terms of precision, MoMoECommunity outperforms MoMoENormVio (mean difference $\approx0.08$). (${}^\ast$$p<0.05$, ${}^{\ast\ast}$$p<0.01$, ${}^{\ast\ast\ast}$$p<0.001$)