Collaborative Content Moderation in the Fediverse

Haris Bin Zia; Aravindh Raman; Ignacio Castro; Gareth Tyson

Collaborative Content Moderation in the Fediverse

Haris Bin Zia, Aravindh Raman, Ignacio Castro, Gareth Tyson

TL;DR

This work tackles content moderation in the decentralized Fediverse, where individual instances must balance local policies with limited resources. It introduces FedMod, a peer-to-peer federated learning system that exchanges partially trained model parameters among similar instances to improve moderation without sharing raw posts. FedMod demonstrates robust improvements across three tasks—harmful content detection, bot content detection, and content warning assignment—achieving average per-instance macro-F1 scores around 0.71, 0.73, and 0.58, respectively, and shows how hashsim-based peer selection, scalable pre-sampling, and policy alignment affect performance. The approach highlights practical pathways for privacy-preserving, collaborative moderation in distributed social networks and outlines future work on reputation, longitudinal dynamics, and deployment with administrators.

Abstract

The Fediverse, a group of interconnected servers providing a variety of interoperable services (e.g. micro-blogging in Mastodon) has gained rapid popularity. This sudden growth, partly driven by Elon Musk's acquisition of Twitter, has created challenges for administrators though. This paper focuses on one particular challenge: content moderation, e.g. the need to remove spam or hate speech. While centralized platforms like Facebook and Twitter rely on automated tools for moderation, their dependence on massive labeled datasets and specialized infrastructure renders them impractical for decentralized, low-resource settings like the Fediverse. In this work, we design and evaluate FedMod, a collaborative content moderation system based on federated learning. Our system enables servers to exchange parameters of partially trained local content moderation models with similar servers, creating a federated model shared among collaborating servers. FedMod demonstrates robust performance on three different content moderation tasks: harmful content detection, bot content detection, and content warning assignment, achieving average per-server macro-F1 scores of 0.71, 0.73, and 0.58, respectively.

Collaborative Content Moderation in the Fediverse

TL;DR

Abstract

Paper Structure (27 sections, 9 figures, 1 table)

This paper contains 27 sections, 9 figures, 1 table.

Introduction
Background & Challenges
A Primer on the Fediverse
Challenges of Content Moderation
Data & Motivation
Dataset
Discovering Instances.
Collecting Posts.
Labeling Posts.
Quantifying Need for Automated Moderation
Potential of Local Models
FedMod Design
Overview of FedMod Design
Scaling up Peer Selection
FedMod Evaluation
...and 12 more sections

Figures (9)

Figure 1: Distribution of number of posts on each instance. Note the log scale on the Y-axis.
Figure 2: Average macro-F1 scores for local content moderation models across all Mastodon instances after each training step for each content moderation task.
Figure 3: Hashsim similarity across the instances.
Figure 4: Average macro-F1 scores for FedMod based collaborative content moderation models across all Mastodon instances after each training step, using both random and hashsim peer selection for each content moderation task. $N$ denotes the number of labeled posts used up to each training step.
Figure 5: Macro-F1 scores of FedMod based collaborative moderation models for each content moderation task across all Mastodon instances, while varying the number of peers ($k$). $N$ denotes the number of labeled posts used up to each training step.
...and 4 more figures

Collaborative Content Moderation in the Fediverse

TL;DR

Abstract

Collaborative Content Moderation in the Fediverse

Authors

TL;DR

Abstract

Table of Contents

Figures (9)