Table of Contents
Fetching ...

Analyzing and Estimating Support for U.S. Presidential Candidates in Twitter Polls

Stephen Scarano, Vijayalakshmi Vasudevan, Chhandak Bagchi, Mattia Samory, JungHwan Yang, Przemyslaw A. Grabowicz

TL;DR

This paper investigates Twitter polls as a data source for gauging public opinion on U.S. presidential candidates during the 2016 and 2020 campaigns. It combines a large Twitter-poll corpus with inferred user attributes (age, gender, ideology, location, bot-likeness) and validates these inferences against human judgments, then uses regression and model-based poststratification to correct for biases. The study finds systematic biases in social polls—candidate ordering, demographic overrepresentation, and bot activity—leading to inflated Trump support relative to traditional polls, but shows that bias-corrected poststratified estimates align closely with election outcomes (errors near 1–2%). These results demonstrate the potential of social polls to complement mainstream polls, provided robust bias-correction methods are applied, while also underscoring ethical considerations around platform transparency and data privacy.

Abstract

Polls posted on social media have emerged in recent years as an important tool for estimating public opinion, e.g., to gauge public support for business decisions and political candidates in national elections. Here, we examine nearly two thousand Twitter polls gauging support for U.S. presidential candidates during the 2016 and 2020 election campaigns. First, we describe the rapidly emerging prevalence of social polls. Second, we characterize social polls in terms of their heterogeneity and response options. Third, leveraging machine learning models for user attribute inference, we describe the demographics, political leanings, and other characteristics of the users who author and interact with social polls. Finally, we study the relationship between social poll results, their attributes, and the characteristics of users interacting with them. Our findings reveal that Twitter polls are biased in various ways, starting from the position of the presidential candidates among the poll options to biases in demographic attributes and poll results. The 2016 and 2020 polls were predominantly crafted by older males and manifested a pronounced bias favoring candidate Donald Trump, in contrast to traditional surveys, which favored Democratic candidates. We further identify and explore the potential reasons for such biases in social polling and discuss their potential repercussions. Finally, we show that biases in social media polls can be corrected via regression and poststratification. The errors of the resulting election estimates can be as low as 1%-2%, suggesting that social media polls can become a promising source of information about public opinion.

Analyzing and Estimating Support for U.S. Presidential Candidates in Twitter Polls

TL;DR

This paper investigates Twitter polls as a data source for gauging public opinion on U.S. presidential candidates during the 2016 and 2020 campaigns. It combines a large Twitter-poll corpus with inferred user attributes (age, gender, ideology, location, bot-likeness) and validates these inferences against human judgments, then uses regression and model-based poststratification to correct for biases. The study finds systematic biases in social polls—candidate ordering, demographic overrepresentation, and bot activity—leading to inflated Trump support relative to traditional polls, but shows that bias-corrected poststratified estimates align closely with election outcomes (errors near 1–2%). These results demonstrate the potential of social polls to complement mainstream polls, provided robust bias-correction methods are applied, while also underscoring ethical considerations around platform transparency and data privacy.

Abstract

Polls posted on social media have emerged in recent years as an important tool for estimating public opinion, e.g., to gauge public support for business decisions and political candidates in national elections. Here, we examine nearly two thousand Twitter polls gauging support for U.S. presidential candidates during the 2016 and 2020 election campaigns. First, we describe the rapidly emerging prevalence of social polls. Second, we characterize social polls in terms of their heterogeneity and response options. Third, leveraging machine learning models for user attribute inference, we describe the demographics, political leanings, and other characteristics of the users who author and interact with social polls. Finally, we study the relationship between social poll results, their attributes, and the characteristics of users interacting with them. Our findings reveal that Twitter polls are biased in various ways, starting from the position of the presidential candidates among the poll options to biases in demographic attributes and poll results. The 2016 and 2020 polls were predominantly crafted by older males and manifested a pronounced bias favoring candidate Donald Trump, in contrast to traditional surveys, which favored Democratic candidates. We further identify and explore the potential reasons for such biases in social polling and discuss their potential repercussions. Finally, we show that biases in social media polls can be corrected via regression and poststratification. The errors of the resulting election estimates can be as low as 1%-2%, suggesting that social media polls can become a promising source of information about public opinion.
Paper Structure (29 sections, 3 equations, 15 figures, 3 tables)

This paper contains 29 sections, 3 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: The two Twitter polls with the highest number of votes in the dataset.
  • Figure 2: A comparison of the raw (hatched bars) and normalized (bars without hatching) outcomes of mainstream polls and respective presidential elections. We unify all polls and election outcomes to focus on the head-to-head race between the two main presidential election candidates by dropping votes for other candidates and response options and normalizing the fractions of responses for the two main candidates. This simple operation reduces the gap between the average mainstream poll outcomes (orange) and election outcomes (red), i.e., the orange and red bars without hatching have similar heights.
  • Figure 3: The number of Twitter and mainstream polls published throughout the 2016 (top) and 2020 (bottom) U.S. presidential election years.
  • Figure 4: Distributions of the number of votes in Twitter and mainstream polls gauging support for the 2016 (top) and 2020 (bottom) U.S. presidential candidates.
  • Figure 5: Breakdown of the number of Twitter poll options.
  • ...and 10 more figures