Table of Contents
Fetching ...

A Multi-Platform Collection of Social Media Posts about the 2022 U.S. Midterm Elections

Rachith Aiyappa, Matthew R. DeVerna, Manita Pote, Bao Tran Truong, Wanying Zhao, David Axelrod, Aria Pessianzadeh, Zoher Kachwala, Munjung Kim, Ozgur Can Seckin, Minsuk Kim, Sunny Gandhi, Amrutha Manikonda, Francesco Pierri, Filippo Menczer, Kai-Cheng Yang

TL;DR

The paper introduces MEIU22, a multi-platform dataset of social media posts about the 2022 U.S. midterm elections, addressing the limitation of platform-specific analyses by linking data across Twitter, Facebook, Instagram, Reddit, and 4chan. It presents a two-VM data architecture, iterative snowball keyword expansion, and a comprehensive candidate-handle roster to enable simultaneous collection of general and candidate-related content from Oct 1 to Dec 25, 2022. It details platform-specific collection pipelines (Twitter via streaming API; CrowdTangle for Facebook/Instagram; Ads Library for political ads; Pushshift for Reddit; 4chan crawling), followed by cleaning, quality evaluation, and volume statistics, plus public release of data and code. The work enables cross-platform analysis of information diffusion, manipulation campaigns, and political communication strategies, with practical impact for researchers studying multi-network election discourse and the dynamics of cross-platform influence operations.

Abstract

Social media are utilized by millions of citizens to discuss important political issues. Politicians use these platforms to connect with the public and broadcast policy positions. Therefore, data from social media has enabled many studies of political discussion. While most analyses are limited to data from individual platforms, people are embedded in a larger information ecosystem spanning multiple social networks. Here we describe and provide access to the Indiana University 2022 U.S. Midterms Multi-Platform Social Media Dataset (MEIU22), a collection of social media posts from Twitter, Facebook, Instagram, Reddit, and 4chan. MEIU22 links to posts about the midterm elections based on a comprehensive list of keywords and tracks the social media accounts of 1,011 candidates from October 1 to December 25, 2022. We also publish the source code of our pipeline to enable similar multi-platform research projects.

A Multi-Platform Collection of Social Media Posts about the 2022 U.S. Midterm Elections

TL;DR

The paper introduces MEIU22, a multi-platform dataset of social media posts about the 2022 U.S. midterm elections, addressing the limitation of platform-specific analyses by linking data across Twitter, Facebook, Instagram, Reddit, and 4chan. It presents a two-VM data architecture, iterative snowball keyword expansion, and a comprehensive candidate-handle roster to enable simultaneous collection of general and candidate-related content from Oct 1 to Dec 25, 2022. It details platform-specific collection pipelines (Twitter via streaming API; CrowdTangle for Facebook/Instagram; Ads Library for political ads; Pushshift for Reddit; 4chan crawling), followed by cleaning, quality evaluation, and volume statistics, plus public release of data and code. The work enables cross-platform analysis of information diffusion, manipulation campaigns, and political communication strategies, with practical impact for researchers studying multi-network election discourse and the dynamics of cross-platform influence operations.

Abstract

Social media are utilized by millions of citizens to discuss important political issues. Politicians use these platforms to connect with the public and broadcast policy positions. Therefore, data from social media has enabled many studies of political discussion. While most analyses are limited to data from individual platforms, people are embedded in a larger information ecosystem spanning multiple social networks. Here we describe and provide access to the Indiana University 2022 U.S. Midterms Multi-Platform Social Media Dataset (MEIU22), a collection of social media posts from Twitter, Facebook, Instagram, Reddit, and 4chan. MEIU22 links to posts about the midterm elections based on a comprehensive list of keywords and tracks the social media accounts of 1,011 candidates from October 1 to December 25, 2022. We also publish the source code of our pipeline to enable similar multi-platform research projects.
Paper Structure (18 sections, 3 figures, 3 tables)

This paper contains 18 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Architecture of the MEIU22 data collection and analysis system. Data flows in the direction of the arrows.
  • Figure 2: Daily volume of midterm-related posts collected through the keyword-matching approach from each platform. For Reddit, we combine the number of submissions and comments together. For the advertisements, we combine the number on Facebook and Instagram. We annotate the election day, i.e., November 8, and the day of the Georgia runoff, i.e., December 6.
  • Figure 3: Daily volume of tweets and posts generated by the congressional candidates on Twitter and Facebook.