What we can learn from TikTok through its Research API
Francesco Corso, Francesco Pierri, Gianmarco De Francisci Morales
TL;DR
This study evaluates the reliability and utility of TikTok's official Research API by constructing a random, monthly-stratified sample of over 500k videos spanning 2018–2023. It analyzes API quotas, data availability, temporal patterns, regional distribution, and engagement metrics, revealing substantial quota shortfalls and notable 2018 data gaps. The results show a strong regional skew toward Asia (with India leading) and a measurable engagement uplift for videos employing viral hashtags, while conspiracy-hashtag prevalence appears limited. The findings offer practical guidance for researchers using the API and highlight biases and data-quality concerns that influence API-based inference and the need for improved transparency.
Abstract
TikTok is a social media platform that has gained immense popularity over the last few years, particularly among younger demographics, due to the viral trends and challenges shared worldwide. The recent release of a free Research API opens the door to collecting data on posted videos, associated comments, and user activities. Our study focuses on evaluating the reliability of the results returned by the Research API, by collecting and analyzing a random sample of TikTok videos posted in a span of 6 years. Our preliminary results are instrumental for future research that aims to study the platform, highlighting caveats on the geographical distribution of videos and on the global prevalence of viral and conspiratorial hashtags.
