The DSA Transparency Database: Auditing Self-reported Moderation Actions by Social Media
Amaury Trujillo, Tiziano Fagni, Stefano Cresci
TL;DR
The paper scrutinizes the DSA Transparency Database over its first 100 days across eight major platforms to audit self-reported moderation actions. It employs a platform-wise quantitative analysis of SoRs, assessing grounds, restriction types, content types, timeliness, and automation, and validates findings against platform transparency reports. The study reveals partial adherence to the database's philosophy, substantial data inadequacies, and notable inconsistencies, especially for X, underscoring challenges in harmonizing reporting across heterogeneous platforms. It offers concrete recommendations to improve the database schema and reporting practices, highlighting implications for policymakers, researchers, and the design of future cross-platform regulatory tools. The work emphasizes the unprecedented opportunity for large-scale moderation research while cautioning against over-interpretation given self-reported data and early-stage schema development.
Abstract
Since September 2023, the Digital Services Act (DSA) obliges large online platforms to submit detailed data on each moderation action they take within the European Union (EU) to the DSA Transparency Database. From its inception, this centralized database has sparked scholarly interest as an unprecedented and potentially unique trove of data on real-world online moderation. Here, we thoroughly analyze all 353.12M records submitted by the eight largest social media platforms in the EU during the first 100 days of the database. Specifically, we conduct a platform-wise comparative study of their: volume of moderation actions, grounds for decision, types of applied restrictions, types of moderated content, timeliness in undertaking and submitting moderation actions, and use of automation. Furthermore, we systematically cross-check the contents of the database with the platforms' own transparency reports. Our analyses reveal that (i) the platforms adhered only in part to the philosophy and structure of the database, (ii) the structure of the database is partially inadequate for the platforms' reporting needs, (iii) the platforms exhibited substantial differences in their moderation actions, (iv) a remarkable fraction of the database data is inconsistent, (v) the platform X (formerly Twitter) presents the most inconsistencies. Our findings have far-reaching implications for policymakers and scholars across diverse disciplines. They offer guidance for future regulations that cater to the reporting needs of online platforms in general, but also highlight opportunities to improve and refine the database itself.
