Table of Contents
Fetching ...

Putting the Count Back Into Accountability: An Analysis of Transparency Data About the Sexual Exploitation of Minors

Robert Grimm

TL;DR

The paper interrogates whether transparency data on child sexual exploitation materials (CSAM) truly reflect an online explosion of the problem, and whether the data themselves are trustworthy. It constructs a granular model of CyberTipline reporting from a diverse dataset of disclosures and analyzes 25 years of activity, supplemented by two independent audits of data quality. Key findings show linear growth in CyberTipline reports largely mirroring social‑media user growth, no statistically robust pandemic impact, and a split where about half of providers offer meaningful, accurate data while others reveal data gaps or quality issues. The authors argue for legislative reform and far stronger, machine‑readable transparency practices to curb data deluge, improve trust, and sustain the clearinghouse function, with practical recommendations to standardize metrics and reporting workflows. Overall, the work replaces sensational narratives with a nuanced, data‑driven understanding of how reporting dynamics, technology, and policy interact in the realm of online CSE disclosure.

Abstract

Alarmist and sensationalist statements about the "explosion" of online child sexual exploitation or CSE dominate much of the public discourse about the topic. Based on a new dataset collecting the transparency disclosures for 16 US-based internet platforms and the national clearinghouse collecting legally mandated reports about CSE, this study seeks answers to two research questions: First, what does the data tell us about the growth of online CSE? Second, how reliable and trustworthy is that data? To answer the two questions, this study proceeds in three parts. First, we leverage a critical literature review to synthesize a granular model for CSE reporting. Second, we analyze the growth in CSE reports over the last 25 years and correlate it with the growth of social media user accounts. Third, we use two comparative audits to assess the quality of transparency data. Critical findings include: First, US law increasingly threatens the very population it claims to protect, i.e., children and adolescents. Second, the rapid growth of CSE report over the last decade is linear and largely driven by an equivalent growth in social media user accounts. Third, the Covid-19 pandemic had no statistically relevant impact on report volume. Fourth, while half of surveyed organizations release meaningful and reasonably accurate transparency data, the other half either fail to make disclosures or release data with severe quality issues.

Putting the Count Back Into Accountability: An Analysis of Transparency Data About the Sexual Exploitation of Minors

TL;DR

The paper interrogates whether transparency data on child sexual exploitation materials (CSAM) truly reflect an online explosion of the problem, and whether the data themselves are trustworthy. It constructs a granular model of CyberTipline reporting from a diverse dataset of disclosures and analyzes 25 years of activity, supplemented by two independent audits of data quality. Key findings show linear growth in CyberTipline reports largely mirroring social‑media user growth, no statistically robust pandemic impact, and a split where about half of providers offer meaningful, accurate data while others reveal data gaps or quality issues. The authors argue for legislative reform and far stronger, machine‑readable transparency practices to curb data deluge, improve trust, and sustain the clearinghouse function, with practical recommendations to standardize metrics and reporting workflows. Overall, the work replaces sensational narratives with a nuanced, data‑driven understanding of how reporting dynamics, technology, and policy interact in the realm of online CSE disclosure.

Abstract

Alarmist and sensationalist statements about the "explosion" of online child sexual exploitation or CSE dominate much of the public discourse about the topic. Based on a new dataset collecting the transparency disclosures for 16 US-based internet platforms and the national clearinghouse collecting legally mandated reports about CSE, this study seeks answers to two research questions: First, what does the data tell us about the growth of online CSE? Second, how reliable and trustworthy is that data? To answer the two questions, this study proceeds in three parts. First, we leverage a critical literature review to synthesize a granular model for CSE reporting. Second, we analyze the growth in CSE reports over the last 25 years and correlate it with the growth of social media user accounts. Third, we use two comparative audits to assess the quality of transparency data. Critical findings include: First, US law increasingly threatens the very population it claims to protect, i.e., children and adolescents. Second, the rapid growth of CSE report over the last decade is linear and largely driven by an equivalent growth in social media user accounts. Third, the Covid-19 pandemic had no statistically relevant impact on report volume. Fourth, while half of surveyed organizations release meaningful and reasonably accurate transparency data, the other half either fail to make disclosures or release data with severe quality issues.
Paper Structure (22 sections, 5 figures, 5 tables)

This paper contains 22 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The relationship between illegal, detected, reported, and uniquely actionable incidents involving the sexual exploitation of minors. The color-coding of csam subcategories from red to green represents a coarse classification from directly harmful to mostly harmless.
  • Figure 2: Yearly CyberTipline reports (in blue, linear fit in gray, left y-axis), reports per 1,000 social media user accounts (in orange, linear fit in gray, right y-axis), and the standard residual (in magenta, right y-axis, parenthesized).
  • Figure 3: Yearly CyberTipline report counts disaggregated by platform on a log scale.
  • Figure 4: Meta: The number of photos and videos, i.e., pieces, removed per quarter on the left; number of pieces and reports per year on the right.
  • Figure 5: Mean difference plots for seven service providers individually and all of them together. Each plot has its own scale, and the x-axis is logarithmic for the eighth plot.