"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies
Brennan Schaffner, Arjun Nitin Bhagoji, Siyuan Cheng, Jacqueline Mei, Jay L. Shen, Grace Wang, Marshini Chetty, Nick Feamster, Genevieve Lakier, Chenhao Tan
TL;DR
The paper tackles the lack of cross-platform understanding of online content moderation by constructing OCMP-43, a large-scale, open-source pipeline that crawls, extracts, and annotates policy text from 43 major platforms. It introduces a user-centric annotation codebook and produces a richly labeled dataset of over 8,500 pages with tens of thousands of annotated segments, enabling quantitative cross-platform comparisons across three topics: copyright infringement, harmful speech, and misinformation. Key findings show extensive variation in policy structure and content across platforms, with legal justifications dominating copyright contexts and community-values justifying harmful/misleading content, while definitions are rare and user recourse is limited outside copyright. The work provides actionable insights for regulators and platforms toward standardization and transparency, demonstrates a scalable methodology for ongoing policy tracking, and opens avenues for future user studies, audits, and longitudinal analyses.
Abstract
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users.
