Table of Contents
Fetching ...

Tracking the 2024 US Presidential Election Chatter on TikTok: A Public Multimodal Dataset

Gabriela Pinto, Charles Bickham, Tanishq Salkar, Joyston Menezes, Luca Luceri, Emilio Ferrara

TL;DR

The paper introduces a public multimodal dataset tracking TikTok discourse around the 2024 U.S. presidential election, collecting over 3 million videos with metadata and transcripts to study political communication, misinformation, and engagement. It details data collection via the TikTok Research API, supplementing with Whisper-generated transcripts and a phased collection plan, while ensuring ethical compliance. Through exploratory analyses including language detection, hashtag co-occurrence networks, and topic modeling with BERTopic and LLaMA 3.1 summarization, the work reveals clear ideological clustering, dominant language usage, and emergent themes such as immigration and conspiracy theories. The resource enables researchers to examine coordination, discourse evolution, and platform-specific dynamics, contributing to understanding digital democracy and election-related communication on social media. The dataset is publicly available, with ongoing updates and future work to incorporate comments and broader linguistic coverage.

Abstract

This paper presents the TikTok 2024 U.S. Presidential Election Dataset, a large-scale, resource designed to advance research into political communication and social media dynamics. The dataset comprises 3.14 million videos published on TikTok between November 1, 2023, and October 16, 2024, encompassing video ids and transcripts. Data collection was conducted using the TikTok Research API with a comprehensive set of election-related keywords and hashtags, supplemented by third-party tools to address API limitations and expand content coverage, enabling analysis of hashtag co-occurrence networks that reveal politically aligned hashtags based on ideological affiliations, the evolution of top hashtags over time, and summary statistics that highlight the dataset's scale and richness. This dataset offers insights into TikTok's role in shaping electoral discourse by providing a multimodal view of election-related content. It enables researchers to explore critical topics such as coordinated messaging, misinformation spread, audience engagement, and linguistic trends. The TikTok 2024 U.S. Presidential Election Dataset is publicly available and aims to contribute to the broader understanding of social media's impact on democracy and public opinion.

Tracking the 2024 US Presidential Election Chatter on TikTok: A Public Multimodal Dataset

TL;DR

The paper introduces a public multimodal dataset tracking TikTok discourse around the 2024 U.S. presidential election, collecting over 3 million videos with metadata and transcripts to study political communication, misinformation, and engagement. It details data collection via the TikTok Research API, supplementing with Whisper-generated transcripts and a phased collection plan, while ensuring ethical compliance. Through exploratory analyses including language detection, hashtag co-occurrence networks, and topic modeling with BERTopic and LLaMA 3.1 summarization, the work reveals clear ideological clustering, dominant language usage, and emergent themes such as immigration and conspiracy theories. The resource enables researchers to examine coordination, discourse evolution, and platform-specific dynamics, contributing to understanding digital democracy and election-related communication on social media. The dataset is publicly available, with ongoing updates and future work to incorporate comments and broader linguistic coverage.

Abstract

This paper presents the TikTok 2024 U.S. Presidential Election Dataset, a large-scale, resource designed to advance research into political communication and social media dynamics. The dataset comprises 3.14 million videos published on TikTok between November 1, 2023, and October 16, 2024, encompassing video ids and transcripts. Data collection was conducted using the TikTok Research API with a comprehensive set of election-related keywords and hashtags, supplemented by third-party tools to address API limitations and expand content coverage, enabling analysis of hashtag co-occurrence networks that reveal politically aligned hashtags based on ideological affiliations, the evolution of top hashtags over time, and summary statistics that highlight the dataset's scale and richness. This dataset offers insights into TikTok's role in shaping electoral discourse by providing a multimodal view of election-related content. It enables researchers to explore critical topics such as coordinated messaging, misinformation spread, audience engagement, and linguistic trends. The TikTok 2024 U.S. Presidential Election Dataset is publicly available and aims to contribute to the broader understanding of social media's impact on democracy and public opinion.

Paper Structure

This paper contains 4 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Timeline of events and volume of TikTok posts.
  • Figure 2: Hashtag Co-Occurrence Graph
  • Figure 3: Hashtag usage trends by category from November 2023 to October 2024, showing peak activity for Republicans, Democrats, Neutral, and Third-Party hashtags over time.