Table of Contents
Fetching ...

FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections

Lingfeng Zhou, Yi Xu, Zhenyu Wang, Dequan Wang

TL;DR

FlockVote tackles the challenge of balancing predictive capability with interpretability in election modeling by using LLM-driven agents endowed with high-fidelity demographic profiles and contextual policy information. The framework treats agent reasoning as a computable laboratory, enabling macro-level fidelity to real election outcomes and micro-level insight into agent rationales. Through macro-level replication, micro-level interpretability, and systematic reliability analyses (bias, instability, and context sensitivity), the work demonstrates both the potential and the current limitations of LLM agents in high-stakes social simulations. The authors advocate for broader applications beyond politics and propose methodological steps toward auditing and mitigating model flaws to advance computational social science.

Abstract

Modeling complex human behavior, such as voter decisions in national elections, is a long-standing challenge for computational social science. Traditional agent-based models (ABMs) are limited by oversimplified rules, while large-scale statistical models often lack interpretability. We introduce FlockVote, a novel framework that uses Large Language Models (LLMs) to build a "computational laboratory" of LLM agents for political simulation. Each agent is instantiated with a high-fidelity demographic profile and dynamic contextual information (e.g. candidate policies), enabling it to perform nuanced, generative reasoning to simulate a voting decision. We deploy this framework as a testbed on the 2024 U.S. Presidential Election, focusing on seven key swing states. Our simulation's macro-level results successfully replicate the real-world outcome, demonstrating the high fidelity of our "virtual society". The primary contribution is not only the prediction, but also the framework's utility as an interpretable research tool. FlockVote moves beyond black-box outputs, allowing researchers to probe agent-level rationale and analyze the stability and sensitivity of LLM-driven social simulations.

FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections

TL;DR

FlockVote tackles the challenge of balancing predictive capability with interpretability in election modeling by using LLM-driven agents endowed with high-fidelity demographic profiles and contextual policy information. The framework treats agent reasoning as a computable laboratory, enabling macro-level fidelity to real election outcomes and micro-level insight into agent rationales. Through macro-level replication, micro-level interpretability, and systematic reliability analyses (bias, instability, and context sensitivity), the work demonstrates both the potential and the current limitations of LLM agents in high-stakes social simulations. The authors advocate for broader applications beyond politics and propose methodological steps toward auditing and mitigating model flaws to advance computational social science.

Abstract

Modeling complex human behavior, such as voter decisions in national elections, is a long-standing challenge for computational social science. Traditional agent-based models (ABMs) are limited by oversimplified rules, while large-scale statistical models often lack interpretability. We introduce FlockVote, a novel framework that uses Large Language Models (LLMs) to build a "computational laboratory" of LLM agents for political simulation. Each agent is instantiated with a high-fidelity demographic profile and dynamic contextual information (e.g. candidate policies), enabling it to perform nuanced, generative reasoning to simulate a voting decision. We deploy this framework as a testbed on the 2024 U.S. Presidential Election, focusing on seven key swing states. Our simulation's macro-level results successfully replicate the real-world outcome, demonstrating the high fidelity of our "virtual society". The primary contribution is not only the prediction, but also the framework's utility as an interpretable research tool. FlockVote moves beyond black-box outputs, allowing researchers to probe agent-level rationale and analyze the stability and sensitivity of LLM-driven social simulations.

Paper Structure

This paper contains 37 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of Social Simulation Methodologies. (Left) Conventional statistical models are often "black boxes" that correlate data-driven factors with outcomes but lack causal or behavioral interpretability. (Center) Traditional agent-based modeling (ABM) relies on agents that follow predefined, heuristic rules. This limits their behavioral realism and ability to adapt to new contextual information. (Right) LLM-powered agent-based modeling, the approach used in our FlockVote framework, serves as a "computational laboratory". It empowers autonomous agents with demographic profiles and dynamic context, enabling them to simulate complex, human-like reasoning. This provides a flexible, nuanced, and interpretable simulation essential for social science inquiry.
  • Figure 2: Comparison Between the Predicted Outcomes and the Actual Election Results. On the maps of the United States, red represents states won by Republicans, while blue indicates states won by Democrats. The only difference between these two maps is in the state of Nevada. In our predictions, Democrats win Nevada with a margin of only 0.17%, while the actual outcome is that Republicans win Nevada, which is such a narrow margin that our forecast is almost spot on.
  • Figure 3: Word Cloud of Agents' Reasons. Aggregated reasons provided by agents for their predicted votes. Key terms such as "stance", "abortion", "economic", and "candidate" dominate the word cloud, highlighting the primary factors influencing voter behavior in this swing state.
  • Figure 4: Ablation study on agent number stability in Pennsylvania. The figure illustrates the impact of different agent numbers (10, 100, 200, 300, 500, 1000, and 2000 agents) on the stability of simulation results, with each agent number repeated over 10 trials using distinct random seeds to generate unique agent profiles. Results show that predictions stabilize when the agent number reaches 300, after which further increases in agent number yield minimal fluctuations. Based on this finding, 300 agents per state are selected for subsequent experiments to balance stability and computational efficiency.
  • Figure 5: Proportions of different race and sex groups in Pennsylvania. This pie chart highlights that the 300-agent simulation is sufficient for every population group. Since other dimensions are modeled independently and each group’s proportion within any given dimension is at least 1%, 300 agents are sufficient to cover all groups effectively.
  • ...and 2 more figures