Table of Contents
Fetching ...

Revisiting Algorithmic Audits of TikTok: Poor Reproducibility and Short-term Validity of Findings

Matej Mosnar, Adam Skurla, Branislav Pecher, Matus Tibensky, Jan Jakubcik, Adrian Bindas, Peter Sakalik, Ivan Srba

TL;DR

This paper examines the reproducibility and generalisability of TikTok algorithmic audits by reimplementing and extending prior sockpuppeting studies. It shows that both audit design and platform dynamics substantially hamper replication, and that findings can shift markedly over time depending on metrics and setup. The authors find that watch duration is a stronger personalization signal than previously reported, while explicit actions exhibit an evolving exploration–exploitation pattern and may be inconsistently captured by GDPR data. They argue for longitudinal, multi-platform, automated audits with transparent data and code to robustly distinguish content evolution from policy changes, thereby improving regulatory effectiveness and audit reliability.

Abstract

Social media platforms are constantly shifting towards algorithmically curated content based on implicit or explicit user feedback. Regulators, as well as researchers, are calling for systematic social media algorithmic audits as this shift leads to enclosing users in filter bubbles and leading them to more problematic content. An important aspect of such audits is the reproducibility and generalisability of their findings, as it allows to draw verifiable conclusions and audit potential changes in algorithms over time. In this work, we study the reproducibility of the existing sockpuppeting audits of TikTok recommender systems, and the generalizability of their findings. In our efforts to reproduce the previous works, we find multiple challenges stemming from social media platform changes and content evolution, but also the research works themselves. These drawbacks limit the audit reproducibility and require an extensive effort altogether with inevitable adjustments to the auditing methodology. Our experiments also reveal that these one-shot audit findings often hold only in the short term, implying that the reproducibility and generalizability of the audits heavily depend on the methodological choices and the state of algorithms and content on the platform. This highlights the importance of reproducible audits that allow us to determine how the situation changes in time.

Revisiting Algorithmic Audits of TikTok: Poor Reproducibility and Short-term Validity of Findings

TL;DR

This paper examines the reproducibility and generalisability of TikTok algorithmic audits by reimplementing and extending prior sockpuppeting studies. It shows that both audit design and platform dynamics substantially hamper replication, and that findings can shift markedly over time depending on metrics and setup. The authors find that watch duration is a stronger personalization signal than previously reported, while explicit actions exhibit an evolving exploration–exploitation pattern and may be inconsistently captured by GDPR data. They argue for longitudinal, multi-platform, automated audits with transparent data and code to robustly distinguish content evolution from policy changes, thereby improving regulatory effectiveness and audit reliability.

Abstract

Social media platforms are constantly shifting towards algorithmically curated content based on implicit or explicit user feedback. Regulators, as well as researchers, are calling for systematic social media algorithmic audits as this shift leads to enclosing users in filter bubbles and leading them to more problematic content. An important aspect of such audits is the reproducibility and generalisability of their findings, as it allows to draw verifiable conclusions and audit potential changes in algorithms over time. In this work, we study the reproducibility of the existing sockpuppeting audits of TikTok recommender systems, and the generalizability of their findings. In our efforts to reproduce the previous works, we find multiple challenges stemming from social media platform changes and content evolution, but also the research works themselves. These drawbacks limit the audit reproducibility and require an extensive effort altogether with inevitable adjustments to the auditing methodology. Our experiments also reveal that these one-shot audit findings often hold only in the short term, implying that the reproducibility and generalizability of the audits heavily depend on the methodological choices and the state of algorithms and content on the platform. This highlights the importance of reproducible audits that allow us to determine how the situation changes in time.

Paper Structure

This paper contains 7 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The similarity in feeds of control and personalised users for the location personalisation factor.
  • Figure 2: Comparison between the like and watch actions using the percentage of videos that contain predefined user interests. We can observe a steady increase for the watch action and a strong exploration aspect for the like action at the start, followed by strong exploitation.
  • Figure 3: Comparison between the effects of different watch durations using the percentage of videos that contain predefined user interests. Watching videos for longer provides a stronger personalisation effect.