Table of Contents
Fetching ...

Dynamics of Algorithmic Content Amplification on TikTok

Fabian Baumann, Nipun Arora, Iyad Rahwan, Agnieszka Czaplicka

TL;DR

This paper asks how quickly and to what extent TikTok’s For You feed amplifies content that aligns with a user’s interests. Using a sock-puppet audit with GPT-3.5-turbo–driven relevance assessment, the authors track time-series signals $r_{ ext{alpha},i}(t)$ and cumulative counts $C_{ ext{alpha},i}(t)$ across Gaming, Food, and Gaming+Food bots, uncovering rapid amplification typically within the first $t_o$ videos and strong topic-specific biases. They model the dynamics with Markov and Hidden Markov Models, revealing elevated transition probabilities toward interest content and varying latent-state complexity across conditions, with Gaming showing the strongest amplification. Additionally, amplified content tends to be less popular and longer, while exploration via hashtag diversity declines as personalization intensifies, indicating a trade-off between personalization and content diversity and highlighting potential socio-algorithmic feedback loops. The study provides empirical evidence of how personalization may constrain exposure to new topics and discusses limitations and directions for broader, longer-term investigations.

Abstract

Intelligent algorithms increasingly shape the content we encounter and engage with online. TikTok's For You feed exemplifies extreme algorithm-driven curation, tailoring the stream of video content almost exclusively based on users' explicit and implicit interactions with the platform. Despite growing attention, the dynamics of content amplification on TikTok remain largely unquantified. How quickly, and to what extent, does TikTok's algorithm amplify content aligned with users' interests? To address these questions, we conduct a sock-puppet audit, deploying bots with different interests to engage with TikTok's "For You" feed. Our findings reveal that content aligned with the bots' interests undergoes strong amplification, with rapid reinforcement typically occurring within the first 200 videos watched. While amplification is consistently observed across all interests, its intensity varies by interest, indicating the emergence of topic-specific biases. Time series analyses and Markov models uncover distinct phases of recommendation dynamics, including persistent content reinforcement and a gradual decline in content diversity over time. Although TikTok's algorithm preserves some content diversity, we find a strong negative correlation between amplification and exploration: as the amplification of interest-aligned content increases, engagement with unseen hashtags declines. These findings contribute to discussions on socio-algorithmic feedback loops in the digital age and the trade-offs between personalization and content diversity.

Dynamics of Algorithmic Content Amplification on TikTok

TL;DR

This paper asks how quickly and to what extent TikTok’s For You feed amplifies content that aligns with a user’s interests. Using a sock-puppet audit with GPT-3.5-turbo–driven relevance assessment, the authors track time-series signals and cumulative counts across Gaming, Food, and Gaming+Food bots, uncovering rapid amplification typically within the first videos and strong topic-specific biases. They model the dynamics with Markov and Hidden Markov Models, revealing elevated transition probabilities toward interest content and varying latent-state complexity across conditions, with Gaming showing the strongest amplification. Additionally, amplified content tends to be less popular and longer, while exploration via hashtag diversity declines as personalization intensifies, indicating a trade-off between personalization and content diversity and highlighting potential socio-algorithmic feedback loops. The study provides empirical evidence of how personalization may constrain exposure to new topics and discusses limitations and directions for broader, longer-term investigations.

Abstract

Intelligent algorithms increasingly shape the content we encounter and engage with online. TikTok's For You feed exemplifies extreme algorithm-driven curation, tailoring the stream of video content almost exclusively based on users' explicit and implicit interactions with the platform. Despite growing attention, the dynamics of content amplification on TikTok remain largely unquantified. How quickly, and to what extent, does TikTok's algorithm amplify content aligned with users' interests? To address these questions, we conduct a sock-puppet audit, deploying bots with different interests to engage with TikTok's "For You" feed. Our findings reveal that content aligned with the bots' interests undergoes strong amplification, with rapid reinforcement typically occurring within the first 200 videos watched. While amplification is consistently observed across all interests, its intensity varies by interest, indicating the emergence of topic-specific biases. Time series analyses and Markov models uncover distinct phases of recommendation dynamics, including persistent content reinforcement and a gradual decline in content diversity over time. Although TikTok's algorithm preserves some content diversity, we find a strong negative correlation between amplification and exploration: as the amplification of interest-aligned content increases, engagement with unseen hashtags declines. These findings contribute to discussions on socio-algorithmic feedback loops in the digital age and the trade-offs between personalization and content diversity.

Paper Structure

This paper contains 14 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Sock-puppet audit and content stream analysis.Panel A provides a schematic overview of the sock-puppet audit methodology. The left side illustrates the stream of video content that bots scroll through, where green and red represent interest-aligned and non-aligned videos, respectively. Time is measured as the discrete number of videos, $t \in \{1, \dots, N\}$, with $t=1$ corresponding to the first video a bot encounters at the start of an experimental run. Bots analyze each video by extracting its meta-information (description and hashtags) and processing it through a large language model, which determines whether the content aligns with the bot's predefined interests. If a video matches the bot’s interests, the bot interacts with the platform in three ways: (i) rewatching the video, (ii) liking the video, and (iii) following the video's creator. Panel B illustrates how key quantities for our analysis are derived from the binary content stream, which encodes whether a video at time $t$ matches the bot’s interest ($S_{\alpha, i}(t) = 1$) or not ($S_{\alpha, i}(t) = 0$). The upper plot schematically depicts the rate of interest-aligned videos over time, $r_{\alpha, i}(t)$, obtained by convolving $S_{\alpha, i}(t)$ with a uniform kernel. The lower plot illustrates the cumulative count of interest-aligned videos, $C_{\alpha, i}(t)$, highlighting its monotonically increasing nature and its relationship to $S_{\alpha, i}(t)$.
  • Figure 2: Content streams for bots interested in Gaming content. The solid green line represents the rate of interest aligned content $r_{G, i}(t)$ and the dashed red line corresponds to the cumulative count of interest-aligned videos $C_{G, i}(t)$, both derived from $S_{G, i}(t)$. The green band spans between the two baselines $b_1$ (dashed line) and $b_2$ (dotted line).
  • Figure 3: Content streams for bots interested in Food content. The solid green line represents the rate of interest aligned content $r_{F, i}(t)$ and the dashed red line corresponds to the cumulative count of interest-aligned videos $C_{F, i}(t)$, both derived from $S_{F, i}(t)$. The green band spans between the two baselines $b_1$ (dashed line) and $b_2$ (dotted line).
  • Figure 4: Content streams for bots interested in Gaming and Food content. The solid green line represents the rate of interest-aligned content, $r_{GF, i}(t)$, while the dashed red line denotes the cumulative count of interest-aligned videos, $C_{GF, i}(t)$, both derived from $S_{GF, i}(t)$. For dual-interest bots, $S_{GF, i}(t)$ is computed using a logical OR operation, meaning $S_{GF, i}(t) = 1$ if the video at time $t$ belongs to either Gaming or Food, or both categories; otherwise $S_{GF, i}(t) = 0$. The orange (blue) line depicts the rate of Gaming (Food) content. The green band spans the range between the two baseline frequencies: $b_1$ (dashed line) and $b_2$ (dotted line), where the baseline $b_2$ is computed as the average of the corresponding baselines of the single interest bots.
  • Figure 5: Cumulative counts of interest-aligned videos. The dark red bars represent the actual number of interest-aligned videos encountered by the bot, compared to the expected counts derived from the two baseline frequencies, $b_1$ and $b_2$. The percentages indicate the proportion of interest-aligned videos relative to the total number of videos watched.
  • ...and 8 more figures