Conspiracy theories and where to find them on TikTok

Francesco Corso; Francesco Pierri; Gianmarco De Francisci Morales

Conspiracy theories and where to find them on TikTok

Francesco Corso, Francesco Pierri, Gianmarco De Francisci Morales

TL;DR

This study quantitatively characterizes conspiracy theories on TikTok using a longitudinal U.S. dataset of about $1.605696\times 10^6$ long videos from $1.178303\times 10^6$ users collected via the TikTok Research API. It combines hashtag-enrichment with manual validation to identify conspiratorial content, estimates a lower-bound prevalence (up to about $1{,}000$ videos per month in 2023), and analyzes the Creativity Program's impact on overall video duration. It also evaluates open-weight LLMs (Llama3, Mistral, Gemma) for conspiracy detection from audio transcripts, showing high precision in some configurations (e.g., ~$0.96$ precision with Step-by-step prompts) but notable trade-offs compared to fine-tuned RoBERTa, underscoring both opportunities and limitations for scalable content moderation. The findings inform moderation strategies and policy design while highlighting the importance of prompt design, data quality, and resource considerations in deploying LLM-based detection systems at scale.

Abstract

TikTok has skyrocketed in popularity over recent years, especially among younger audiences. However, there are public concerns about the potential of this platform to promote and amplify harmful content. This study presents the first systematic analysis of conspiracy theories on TikTok. By leveraging the official TikTok Research API we collect a longitudinal dataset of 1.5M videos shared in the U.S. over three years. We estimate a lower bound on the prevalence of conspiratorial videos (up to 1000 new videos per month) and evaluate the effects of TikTok's Creativity Program for monetization, observing an overall increase in video duration regardless of content. Lastly, we evaluate the capabilities of state-of-the-art open-weight Large Language Models to identify conspiracy theories from audio transcriptions of videos. While these models achieve high precision in detecting harmful content (up to 96%), their overall performance remains comparable to fine-tuned traditional models such as RoBERTa. Our findings suggest that Large Language Models can serve as an effective tool for supporting content moderation strategies aimed at reducing the spread of harmful content on TikTok.

Conspiracy theories and where to find them on TikTok

TL;DR

This study quantitatively characterizes conspiracy theories on TikTok using a longitudinal U.S. dataset of about

long videos from

users collected via the TikTok Research API. It combines hashtag-enrichment with manual validation to identify conspiratorial content, estimates a lower-bound prevalence (up to about

videos per month in 2023), and analyzes the Creativity Program's impact on overall video duration. It also evaluates open-weight LLMs (Llama3, Mistral, Gemma) for conspiracy detection from audio transcripts, showing high precision in some configurations (e.g., ~

precision with Step-by-step prompts) but notable trade-offs compared to fine-tuned RoBERTa, underscoring both opportunities and limitations for scalable content moderation. The findings inform moderation strategies and policy design while highlighting the importance of prompt design, data quality, and resource considerations in deploying LLM-based detection systems at scale.

Abstract

Paper Structure (24 sections, 3 equations, 11 figures, 2 tables)

This paper contains 24 sections, 3 equations, 11 figures, 2 tables.

Introduction
Related Work
Methods
Dataset
Conspiracy Hashtag Enrichment
Evaluation of Conspiracy Hashtags
Estimating the Number of U.S. Videos on TikTok
Video Transcripts
Large Language Models and Prompting Strategies
Results
Lower Bound on the Number of Conspiracy Videos on TikTok
Impact of the Creativity Program on Conspiratorial Content
Detecting Conspiracy Theories with LLMs
Discussion and Conclusion
Appendix
...and 9 more sections

Figures (11)

Figure 1: (a) Estimated number of long videos on TikTok U.S. per month. (b) Percentage of conspiracy videos. (c) Estimated number of long conspiracy videos on TikTok U.S.
Figure 2: Distribution of the duration in seconds of videos created before (N=42.0) and after (N=103.0) the beginning of the Creativity Program (May 3, 2023). The vertical blue line indicates $60$ seconds. Medians: before = $13.4$ s, after = $14.7$ s
Figure 3: Aggregated results on the positive class (conspiracy videos) of the balanced classification experiment with distant labels (C1), for the three prompts and the three models plus the ensemble, across all seeds. The red line is the result of the fine-tuning, 95% C.I. for RoBERTa.
Figure 4: Aggregated results on the positive class (conspiracy videos) of the unbalanced classification experiment with manual labels (C3), for the three prompts and the three models plus the ensemble, across all seeds. The red line is the result of the fine-tuning, 95% C.I. for RoBERTa.
Figure 5: Aggregated results on the positive class (conspiracy videos) of the balanced classification experiment with distant labels (C1), for the three prompts and the three models plus the ensemble, across all seeds, split by quartiles on the distribution of the length of the transcriptions in words. The red line is the result of the fine-tuning, 95% C.I. for RoBERTa. Q1 = $210$ words, Q2 = $325$ words, Q3 = $472$ words, Q4 = $1919$ words.
...and 6 more figures

Conspiracy theories and where to find them on TikTok

TL;DR

Abstract

Conspiracy theories and where to find them on TikTok

Authors

TL;DR

Abstract

Table of Contents

Figures (11)