LLM Voting: Human Choices and AI Collective Decision Making
Joshua C. Yang, Damian Dailisan, Marcin Korecki, Carina I. Hausladen, Dirk Helbing
TL;DR
This work investigates how large language models (LLMs) vote in a participatory budgeting context and how their outputs align with human voters. By comparing GPT-4 Turbo and LLaMA-2 across four voting methods, varying temperature, list order, and persona/CoT prompts, the study quantifies alignment with human preferences using Kendall's tau and Jaccard metrics. Key findings show that LLMs exhibit bias and limited voting diversity, with presentation order and persona affecting outcomes; CoT does not improve alignment but can enhance explainability, revealing trade-offs between diversity and accuracy. The results underscore the need for careful, human-in-the-loop integration of LLMs in democratic processes to mitigate biases and preserve democratic legitimacy.
Abstract
This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and conducting a corresponding experiment with LLM agents. We observed that the choice of voting methods and the presentation order influenced LLM voting outcomes. We found that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the need for cautious integration of LLMs into democratic processes.
