A Multi-Platform Collection of Social Media Posts about the 2022 U.S. Midterm Elections
Rachith Aiyappa, Matthew R. DeVerna, Manita Pote, Bao Tran Truong, Wanying Zhao, David Axelrod, Aria Pessianzadeh, Zoher Kachwala, Munjung Kim, Ozgur Can Seckin, Minsuk Kim, Sunny Gandhi, Amrutha Manikonda, Francesco Pierri, Filippo Menczer, Kai-Cheng Yang
TL;DR
The paper introduces MEIU22, a multi-platform dataset of social media posts about the 2022 U.S. midterm elections, addressing the limitation of platform-specific analyses by linking data across Twitter, Facebook, Instagram, Reddit, and 4chan. It presents a two-VM data architecture, iterative snowball keyword expansion, and a comprehensive candidate-handle roster to enable simultaneous collection of general and candidate-related content from Oct 1 to Dec 25, 2022. It details platform-specific collection pipelines (Twitter via streaming API; CrowdTangle for Facebook/Instagram; Ads Library for political ads; Pushshift for Reddit; 4chan crawling), followed by cleaning, quality evaluation, and volume statistics, plus public release of data and code. The work enables cross-platform analysis of information diffusion, manipulation campaigns, and political communication strategies, with practical impact for researchers studying multi-network election discourse and the dynamics of cross-platform influence operations.
Abstract
Social media are utilized by millions of citizens to discuss important political issues. Politicians use these platforms to connect with the public and broadcast policy positions. Therefore, data from social media has enabled many studies of political discussion. While most analyses are limited to data from individual platforms, people are embedded in a larger information ecosystem spanning multiple social networks. Here we describe and provide access to the Indiana University 2022 U.S. Midterms Multi-Platform Social Media Dataset (MEIU22), a collection of social media posts from Twitter, Facebook, Instagram, Reddit, and 4chan. MEIU22 links to posts about the midterm elections based on a comprehensive list of keywords and tracks the social media accounts of 1,011 candidates from October 1 to December 25, 2022. We also publish the source code of our pipeline to enable similar multi-platform research projects.
