Table of Contents
Fetching ...

The DSA Transparency Database: Auditing Self-reported Moderation Actions by Social Media

Amaury Trujillo, Tiziano Fagni, Stefano Cresci

TL;DR

The paper scrutinizes the DSA Transparency Database over its first 100 days across eight major platforms to audit self-reported moderation actions. It employs a platform-wise quantitative analysis of SoRs, assessing grounds, restriction types, content types, timeliness, and automation, and validates findings against platform transparency reports. The study reveals partial adherence to the database's philosophy, substantial data inadequacies, and notable inconsistencies, especially for X, underscoring challenges in harmonizing reporting across heterogeneous platforms. It offers concrete recommendations to improve the database schema and reporting practices, highlighting implications for policymakers, researchers, and the design of future cross-platform regulatory tools. The work emphasizes the unprecedented opportunity for large-scale moderation research while cautioning against over-interpretation given self-reported data and early-stage schema development.

Abstract

Since September 2023, the Digital Services Act (DSA) obliges large online platforms to submit detailed data on each moderation action they take within the European Union (EU) to the DSA Transparency Database. From its inception, this centralized database has sparked scholarly interest as an unprecedented and potentially unique trove of data on real-world online moderation. Here, we thoroughly analyze all 353.12M records submitted by the eight largest social media platforms in the EU during the first 100 days of the database. Specifically, we conduct a platform-wise comparative study of their: volume of moderation actions, grounds for decision, types of applied restrictions, types of moderated content, timeliness in undertaking and submitting moderation actions, and use of automation. Furthermore, we systematically cross-check the contents of the database with the platforms' own transparency reports. Our analyses reveal that (i) the platforms adhered only in part to the philosophy and structure of the database, (ii) the structure of the database is partially inadequate for the platforms' reporting needs, (iii) the platforms exhibited substantial differences in their moderation actions, (iv) a remarkable fraction of the database data is inconsistent, (v) the platform X (formerly Twitter) presents the most inconsistencies. Our findings have far-reaching implications for policymakers and scholars across diverse disciplines. They offer guidance for future regulations that cater to the reporting needs of online platforms in general, but also highlight opportunities to improve and refine the database itself.

The DSA Transparency Database: Auditing Self-reported Moderation Actions by Social Media

TL;DR

The paper scrutinizes the DSA Transparency Database over its first 100 days across eight major platforms to audit self-reported moderation actions. It employs a platform-wise quantitative analysis of SoRs, assessing grounds, restriction types, content types, timeliness, and automation, and validates findings against platform transparency reports. The study reveals partial adherence to the database's philosophy, substantial data inadequacies, and notable inconsistencies, especially for X, underscoring challenges in harmonizing reporting across heterogeneous platforms. It offers concrete recommendations to improve the database schema and reporting practices, highlighting implications for policymakers, researchers, and the design of future cross-platform regulatory tools. The work emphasizes the unprecedented opportunity for large-scale moderation research while cautioning against over-interpretation given self-reported data and early-stage schema development.

Abstract

Since September 2023, the Digital Services Act (DSA) obliges large online platforms to submit detailed data on each moderation action they take within the European Union (EU) to the DSA Transparency Database. From its inception, this centralized database has sparked scholarly interest as an unprecedented and potentially unique trove of data on real-world online moderation. Here, we thoroughly analyze all 353.12M records submitted by the eight largest social media platforms in the EU during the first 100 days of the database. Specifically, we conduct a platform-wise comparative study of their: volume of moderation actions, grounds for decision, types of applied restrictions, types of moderated content, timeliness in undertaking and submitting moderation actions, and use of automation. Furthermore, we systematically cross-check the contents of the database with the platforms' own transparency reports. Our analyses reveal that (i) the platforms adhered only in part to the philosophy and structure of the database, (ii) the structure of the database is partially inadequate for the platforms' reporting needs, (iii) the platforms exhibited substantial differences in their moderation actions, (iv) a remarkable fraction of the database data is inconsistent, (v) the platform X (formerly Twitter) presents the most inconsistencies. Our findings have far-reaching implications for policymakers and scholars across diverse disciplines. They offer guidance for future regulations that cater to the reporting needs of online platforms in general, but also highlight opportunities to improve and refine the database itself.
Paper Structure (34 sections, 11 figures, 3 tables)

This paper contains 34 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Timeline of relevant events concerning the DSA Transparency Database.
  • Figure 2: Platform-wise distribution of infringement categories of statement of reasons (SoR), seriated using Manhattan distance across platforms and categories, so that similar platforms are positioned close to one another.
  • Figure 3: Platform-wise distribution for decisions of the visibility type, seriated using Manhattan distance across platforms and sub-types, so that similar platforms are positioned close to one another.
  • Figure 4: Distribution of moderated content type by platform. For the detailed other content type (right side of the figure), we simplified the original free text labels for space reasons.
  • Figure 5: We analyzed the timeliness of submitted moderation actions based on: lag between the creation of the infringing content and the application of the moderation decision (application delay); and lag between this application and the communication to the DSA-TDB, which implies the creation of a statement of reasons (communication delay). Labels in monospaced font refer to the database fields used for computing the delays.
  • ...and 6 more figures