Table of Contents
Fetching ...

Do LLMs Track Public Opinion? A Multi-Model Study of Favorability Predictions in the 2024 U.S. Presidential Election

Riya Parikh, Sarah H. Cen, Chara Podimata

TL;DR

The paper investigates whether large language models can track public opinion as measured by traditional exit polls during the 2024 U.S. presidential cycle. Using the llm-election-data-2024 dataset, it compares nine LLM configurations against five ground-truth polls for Kamala Harris and Donald Trump, mapping model outputs to poll categories. The results reveal systematic directional miscalibration, with Harris consistently overpredicted in favorability (roughly 10–40 percentage points), while Trump shows smaller, poll-dependent biases and less cross-model variation; internet augmentation and 7-day rolling averages do not fully correct these errors. The findings imply that off-the-shelf LLMs are not reliable polling substitutes and underscore the need for calibration, ensembles, and careful model selection in any forecasting pipeline.

Abstract

We investigate whether Large Language Models (LLMs) can track public opinion as measured by exit polls during the 2024 U.S. presidential election cycle. Our analysis focuses on headline favorability (e.g., "Favorable" vs. "Unfavorable") of presidential candidates across multiple LLMs queried daily throughout the election season. Using the publicly available llm-election-data-2024 dataset, we evaluate predictions from nine LLM configurations against a curated set of five high-quality polls from major organizations including Reuters, CNN, Gallup, Quinnipiac, and ABC. We find systematic directional miscalibration. For Kamala Harris, all models overpredict favorability by 10-40% relative to polls. For Donald Trump, biases are smaller (5-10%) and poll-dependent, with substantially lower cross-model variation. These deviations persist under temporal smoothing and are not corrected by internet-augmented retrieval. We conclude that off-the-shelf LLMs do not reliably track polls when queried in a straightforward manner and discuss implications for election forecasting.

Do LLMs Track Public Opinion? A Multi-Model Study of Favorability Predictions in the 2024 U.S. Presidential Election

TL;DR

The paper investigates whether large language models can track public opinion as measured by traditional exit polls during the 2024 U.S. presidential cycle. Using the llm-election-data-2024 dataset, it compares nine LLM configurations against five ground-truth polls for Kamala Harris and Donald Trump, mapping model outputs to poll categories. The results reveal systematic directional miscalibration, with Harris consistently overpredicted in favorability (roughly 10–40 percentage points), while Trump shows smaller, poll-dependent biases and less cross-model variation; internet augmentation and 7-day rolling averages do not fully correct these errors. The findings imply that off-the-shelf LLMs are not reliable polling substitutes and underscore the need for calibration, ensembles, and careful model selection in any forecasting pipeline.

Abstract

We investigate whether Large Language Models (LLMs) can track public opinion as measured by exit polls during the 2024 U.S. presidential election cycle. Our analysis focuses on headline favorability (e.g., "Favorable" vs. "Unfavorable") of presidential candidates across multiple LLMs queried daily throughout the election season. Using the publicly available llm-election-data-2024 dataset, we evaluate predictions from nine LLM configurations against a curated set of five high-quality polls from major organizations including Reuters, CNN, Gallup, Quinnipiac, and ABC. We find systematic directional miscalibration. For Kamala Harris, all models overpredict favorability by 10-40% relative to polls. For Donald Trump, biases are smaller (5-10%) and poll-dependent, with substantially lower cross-model variation. These deviations persist under temporal smoothing and are not corrected by internet-augmented retrieval. We conclude that off-the-shelf LLMs do not reliably track polls when queried in a straightforward manner and discuss implications for election forecasting.
Paper Structure (20 sections, 10 figures, 1 table)

This paper contains 20 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Kamala Harris favorability prediction averaged over time from LLMs (9 leftmost bars) vs actual polls (5 rightmost bars).
  • Figure 2: 7-day rolling average of LLM predictions of Kamala Harris favorability (solid lines) vs actual polls (dashed lines). Shaded regions correspond to when each poll was conducted.
  • Figure 3: Donald Trump favorability prediction averaged over time from LLMs (9 left bars) vs actual polls (5 right bars).
  • Figure 4: 7-day rolling average of LLM predictions of Donald Trump favorability (solid lines) vs actual polls (dashed lines). Shaded regions correspond to when each poll was conducted.
  • Figure 5: Average-over-time predictions of Kamala Harris unfavorability (9 left bars) vs actual polls (5 right bars).
  • ...and 5 more figures