Do LLMs Track Public Opinion? A Multi-Model Study of Favorability Predictions in the 2024 U.S. Presidential Election
Riya Parikh, Sarah H. Cen, Chara Podimata
TL;DR
The paper investigates whether large language models can track public opinion as measured by traditional exit polls during the 2024 U.S. presidential cycle. Using the llm-election-data-2024 dataset, it compares nine LLM configurations against five ground-truth polls for Kamala Harris and Donald Trump, mapping model outputs to poll categories. The results reveal systematic directional miscalibration, with Harris consistently overpredicted in favorability (roughly 10–40 percentage points), while Trump shows smaller, poll-dependent biases and less cross-model variation; internet augmentation and 7-day rolling averages do not fully correct these errors. The findings imply that off-the-shelf LLMs are not reliable polling substitutes and underscore the need for calibration, ensembles, and careful model selection in any forecasting pipeline.
Abstract
We investigate whether Large Language Models (LLMs) can track public opinion as measured by exit polls during the 2024 U.S. presidential election cycle. Our analysis focuses on headline favorability (e.g., "Favorable" vs. "Unfavorable") of presidential candidates across multiple LLMs queried daily throughout the election season. Using the publicly available llm-election-data-2024 dataset, we evaluate predictions from nine LLM configurations against a curated set of five high-quality polls from major organizations including Reuters, CNN, Gallup, Quinnipiac, and ABC. We find systematic directional miscalibration. For Kamala Harris, all models overpredict favorability by 10-40% relative to polls. For Donald Trump, biases are smaller (5-10%) and poll-dependent, with substantially lower cross-model variation. These deviations persist under temporal smoothing and are not corrected by internet-augmented retrieval. We conclude that off-the-shelf LLMs do not reliably track polls when queried in a straightforward manner and discuss implications for election forecasting.
