A Large-Scale Simulation on Large Language Models for Decision-Making in Political Science
Chenxiao Yu, Jinyi Ye, Yuangang Li, Zheng Li, Emilio Ferrara, Xiyang Hu, Yue Zhao
TL;DR
This paper addresses whether large language models can reliably simulate voter decision-making in political contexts by introducing a theory-driven multi-step reasoning pipeline that integrates demographic, temporal, and ideological information. It enforces synthetic data alignment via SynC and real ANES benchmarks, and evaluates multiple LLMs across three pipeline variants, finding that the multi-step ideology inference (V3) achieves superior alignment with real voting patterns and robustness across models. The study also reveals residual issues such as systematic biases, demographic stereotype amplification, and overestimated ideological influence, highlighting the need for debiasing and human-in-the-loop techniques. Overall, the work provides a scalable framework for modeling political decision-making with LLMs and outlines practical directions to improve reliability and fairness in such simulations.
Abstract
While LLMs have demonstrated remarkable capabilities in text generation and reasoning, their ability to simulate human decision-making -- particularly in political contexts -- remains an open question. However, modeling voter behavior presents unique challenges due to limited voter-level data, evolving political landscapes, and the complexity of human reasoning. In this study, we develop a theory-driven, multi-step reasoning framework that integrates demographic, temporal and ideological factors to simulate voter decision-making at scale. Using synthetic personas calibrated to real-world voter data, we conduct large-scale simulations of recent U.S. presidential elections. Our method significantly improves simulation accuracy while mitigating model biases. We examine its robustness by comparing performance across different LLMs. We further investigate the challenges and constraints that arise from LLM-based political simulations. Our work provides both a scalable framework for modeling political decision-making behavior and insights into the promise and limitations of using LLMs in political science research.
