Table of Contents
Fetching ...

LLM Voting: Human Choices and AI Collective Decision Making

Joshua C. Yang, Damian Dailisan, Marcin Korecki, Carina I. Hausladen, Dirk Helbing

TL;DR

This work investigates how large language models (LLMs) vote in a participatory budgeting context and how their outputs align with human voters. By comparing GPT-4 Turbo and LLaMA-2 across four voting methods, varying temperature, list order, and persona/CoT prompts, the study quantifies alignment with human preferences using Kendall's tau and Jaccard metrics. Key findings show that LLMs exhibit bias and limited voting diversity, with presentation order and persona affecting outcomes; CoT does not improve alignment but can enhance explainability, revealing trade-offs between diversity and accuracy. The results underscore the need for careful, human-in-the-loop integration of LLMs in democratic processes to mitigate biases and preserve democratic legitimacy.

Abstract

This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and conducting a corresponding experiment with LLM agents. We observed that the choice of voting methods and the presentation order influenced LLM voting outcomes. We found that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the need for cautious integration of LLMs into democratic processes.

LLM Voting: Human Choices and AI Collective Decision Making

TL;DR

This work investigates how large language models (LLMs) vote in a participatory budgeting context and how their outputs align with human voters. By comparing GPT-4 Turbo and LLaMA-2 across four voting methods, varying temperature, list order, and persona/CoT prompts, the study quantifies alignment with human preferences using Kendall's tau and Jaccard metrics. Key findings show that LLMs exhibit bias and limited voting diversity, with presentation order and persona affecting outcomes; CoT does not improve alignment but can enhance explainability, revealing trade-offs between diversity and accuracy. The results underscore the need for careful, human-in-the-loop integration of LLMs in democratic processes to mitigate biases and preserve democratic legitimacy.

Abstract

This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and conducting a corresponding experiment with LLM agents. We observed that the choice of voting methods and the presentation order influenced LLM voting outcomes. We found that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the need for cautious integration of LLMs into democratic processes.
Paper Structure (36 sections, 10 figures, 2 tables)

This paper contains 36 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of the LLM voting experimental setup
  • Figure 2: Overview of the LLM prompt template.
  • Figure 3: Histograms showing the frequency percentage of a certain number of selected projects out of 24 projects in Approval Voting between human and LLM voters.
  • Figure 4: A: Heatmap of Kendall's $\tau$ across different voting methods with votes cast by human ("_h"), GPT-4 ("_g"), and LLaMA-2 ("_l") voters, showing the similarity between ranking orders of 24 projects, across different groups and methods. Voting methods include 5-Approval (kapp), Approval (appr), Cumulative (10 points) (cumu), and Ranked (rank). Higher values of $\tau$ (green) show greater agreement between the compared voting methods. B: Comparative analysis of the primacy effect on LLM ranking behaviors in 5-Approval voting. The outcomes of human votes (baseline) are juxtaposed against results with reversed presentation order or ID sequence, showing how the presentation sequence affects ranking. Project rankings are color-coded to reflect their relative positioning, with green representing projects in the top half, and blue in the bottom half. In addition, the top and bottom three projects are shown with darker hues. Disparities in $\tau$ values underscore the variable susceptibility of each LLM to ordering effects in vote aggregation.
  • Figure 5: A: Stacked bar plots displaying the distribution of 5-approval votes among different districts (left) and urban project categories (right) for various voter types, including human participants, LLaMA-2, GPT-4, and their respective enhancements (persona, CoT). The addition of persona and CoT is denoted as "+p" and "+c" respectively. Each bar's segment denotes the proportion of votes contributed by each voter type to the district, category, or cost. B: Box plot of Jaccard Similarity of Individual human votes against the agent votes. The addition of persona and CoT is denoted as "+p" and "+c" respectively. The short red lines and annotated numbers denote mean values. The closer agents' votes are to human votes, the closer the value is to 1.0.
  • ...and 5 more figures