Table of Contents
Fetching ...

A Large-Scale Simulation on Large Language Models for Decision-Making in Political Science

Chenxiao Yu, Jinyi Ye, Yuangang Li, Zheng Li, Emilio Ferrara, Xiyang Hu, Yue Zhao

TL;DR

This paper addresses whether large language models can reliably simulate voter decision-making in political contexts by introducing a theory-driven multi-step reasoning pipeline that integrates demographic, temporal, and ideological information. It enforces synthetic data alignment via SynC and real ANES benchmarks, and evaluates multiple LLMs across three pipeline variants, finding that the multi-step ideology inference (V3) achieves superior alignment with real voting patterns and robustness across models. The study also reveals residual issues such as systematic biases, demographic stereotype amplification, and overestimated ideological influence, highlighting the need for debiasing and human-in-the-loop techniques. Overall, the work provides a scalable framework for modeling political decision-making with LLMs and outlines practical directions to improve reliability and fairness in such simulations.

Abstract

While LLMs have demonstrated remarkable capabilities in text generation and reasoning, their ability to simulate human decision-making -- particularly in political contexts -- remains an open question. However, modeling voter behavior presents unique challenges due to limited voter-level data, evolving political landscapes, and the complexity of human reasoning. In this study, we develop a theory-driven, multi-step reasoning framework that integrates demographic, temporal and ideological factors to simulate voter decision-making at scale. Using synthetic personas calibrated to real-world voter data, we conduct large-scale simulations of recent U.S. presidential elections. Our method significantly improves simulation accuracy while mitigating model biases. We examine its robustness by comparing performance across different LLMs. We further investigate the challenges and constraints that arise from LLM-based political simulations. Our work provides both a scalable framework for modeling political decision-making behavior and insights into the promise and limitations of using LLMs in political science research.

A Large-Scale Simulation on Large Language Models for Decision-Making in Political Science

TL;DR

This paper addresses whether large language models can reliably simulate voter decision-making in political contexts by introducing a theory-driven multi-step reasoning pipeline that integrates demographic, temporal, and ideological information. It enforces synthetic data alignment via SynC and real ANES benchmarks, and evaluates multiple LLMs across three pipeline variants, finding that the multi-step ideology inference (V3) achieves superior alignment with real voting patterns and robustness across models. The study also reveals residual issues such as systematic biases, demographic stereotype amplification, and overestimated ideological influence, highlighting the need for debiasing and human-in-the-loop techniques. Overall, the work provides a scalable framework for modeling political decision-making with LLMs and outlines practical directions to improve reliability and fairness in such simulations.

Abstract

While LLMs have demonstrated remarkable capabilities in text generation and reasoning, their ability to simulate human decision-making -- particularly in political contexts -- remains an open question. However, modeling voter behavior presents unique challenges due to limited voter-level data, evolving political landscapes, and the complexity of human reasoning. In this study, we develop a theory-driven, multi-step reasoning framework that integrates demographic, temporal and ideological factors to simulate voter decision-making at scale. Using synthetic personas calibrated to real-world voter data, we conduct large-scale simulations of recent U.S. presidential elections. Our method significantly improves simulation accuracy while mitigating model biases. We examine its robustness by comparing performance across different LLMs. We further investigate the challenges and constraints that arise from LLM-based political simulations. Our work provides both a scalable framework for modeling political decision-making behavior and insights into the promise and limitations of using LLMs in political science research.

Paper Structure

This paper contains 33 sections, 7 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Progressive design of LLM pipelines for voter simulation. V1: Demographic-Only Prompting (§ \ref{['subsec:v1']}) uses static personas but lacks temporal context. V2: Time-Based Prompting (§ \ref{['subsec:v2']}) adds election-year data V3: Multi-Step Reasoning (§ \ref{['subsec:v3']}) structures decision-making into steps, improving reasoning and alignment.
  • Figure 2: Comparison of the Three Pipelines on ANES 2016 and 2020. The y-axis represents the predicted voting ratio (Eq. \ref{['eq:prob']}). The red baseline indicates the ground truth voting ratios from the ANES dataset.
  • Figure 3: Comparison of LLM-simulated voting patterns by gender and race against real human data from the Pew Report.
  • Figure 4: Logistic regression analysis of political ideology and voting preference, comparing LLM simulations with real human data (ANES).
  • Figure A1: LLM’s simulations for four states in the 2020 election compared with Ground Truth results. The figure presents results for one red state (Ohio, OH), one blue state (Illinois, IL), one swing state (Wisconsin, WI), and one tipping-point state (Florida, FL). V1 and V2 pipelines tend to underestimate Republican support, while V3 (Multi-step Reasoning) provides the closest alignment with actual outcomes, especially in swing and tipping-point states.
  • ...and 5 more figures