The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents
Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers
TL;DR
This paper investigates whether the wisdom of partisan crowds (WOC) extends to groups of LLM-based agents role-playing Democrat and Republican personas. It adopts a Becker-style benchmark to quantify WOC, partisan bias, and human-likeness, using two LLMs (ChatGPT and Vicuna) and varying prompting conditions (detailed vs simple personas; with/without chain-of-thought) plus supervised fine-tuning on human data. Key findings show that LLM agents exhibit WOC-like error reduction in the absence of chain-of-thought and with detailed personas, while chain-of-thought prompts attenuate WOC but enhance human-like partisan bias; fine-tuning further improves human-like dynamics but can introduce overfitting on unseen questions. The work demonstrates both the potential and limitations of LLM-based agents as models or simulators of human collective intelligence and highlights how human data can guide the design of socially intelligent AI agents.
Abstract
Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence.
