Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Srijoni Majumdar; Edith Elkind; Evangelos Pournaras

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Srijoni Majumdar, Edith Elkind, Evangelos Pournaras

TL;DR

This work investigates how Generative AI voting with large language models can scale deliberative participation while exposing biases and inconsistencies in AI representations of voters. Using a factorial design across real-world datasets and more than 50,000 AI voting personas in 306 elections, the authors quantify inconsistencies such as under-representation, AI approximation errors, and AI intransitivity across ballot formats. They demonstrate that fair collective choice methods like equal shares and Phragmén's rule yield greater resilience to AI biases, and that AI representations of abstaining voters can meaningfully recover lost consistency under these fair rules. The study also links abstainer traits to specific cognitive biases and uses explainable AI tools to map how these traits drive AI decisions, offering guidance for safeguarding democracy as AI-assisted voting scales up in participatory budgeting and elections.

Abstract

Scaling up deliberative and voting participation is a longstanding endeavor -- a cornerstone for direct democracy and legitimate collective choice. Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) unravel new capabilities for AI personal assistants to overcome cognitive bandwidth limitations of humans, providing decision support or even direct representation of human voters at large scale. However, the quality of this representation and what underlying biases manifest when delegating collective decision-making to LLMs is an alarming and timely challenge to tackle. By rigorously emulating with high realism more than >50K LLM voting personas in 306 real-world voting elections, we disentangle the nature of different biases in LLMS (GPT 3, GPT 3.5, and Llama2). Complex preferential ballot formats exhibit significant inconsistencies compared to simpler majoritarian elections that show higher consistency. Strikingly though, by demonstrating for the first time in real-world a proportional representation of voters in direct democracy, we are also able to show that fair ballot aggregation methods, such as equal shares, prove to be a win-win: fairer voting outcomes for humans with fairer AI representation, especially for voters who are likely to abstain. This novel underlying relationship proves paramount for democratic resilience in progressives scenarios with low voters turnout and voter fatigue supported by AI representatives: abstained voters are mitigated by recovering highly representative voting outcomes that are fairer. These interdisciplinary insights provide remarkable foundations for science, policymakers, and citizens to develop safeguards and resilience for AI risks in democratic innovations.

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

TL;DR

Abstract

Paper Structure (11 sections, 1 equation, 5 figures, 1 table)

This paper contains 11 sections, 1 equation, 5 figures, 1 table.

Introduction
Results
Fair collective choice is resilient to human-AI inconsistencies
AI representatives to recover from low voters turnout
Biases explaining AI (in)consistencies in choice and preference transitivity
Discussion
Methods
Emulating AI representatives
Voting datasets
Evaluation of choices by AI representatives
Explainability of generative AI voting

Figures (5)

Figure 1: An overview of the studied generative AI voting framework. (A) Three manifestations of choice inconsistency are distinguished, measured using Condorcet pairwise matches: (i) inconsistencies by under-representation as a result of low voters turnout, (ii) inconsistencies by inaccuracy of AI choice to approximate human choice and (iii) inconsistency by intransitivity of AI choices over different ballot formats. (B) The factorial design with the 7 studied dimensions: (i) Real-world voting scenarios in the context of participatory budgeting and national elections. (ii) Various combinations of personal human traits (features) based on which AI voting personas are created. (iii) Four ballot formats. (iv) Four AI models, three large language models and a predictive machine learning model (benchmark). (v) Ballot aggregation methods for elections and participatory budgeting. (vi) The three abstaining models that are based for engagement, digital literacy and trust. (vii) Participation modalities ranging from exclusive human participation of varying turnout to mixed populations of humans and AI representatives of abstained voters. The studied combinations for each voting scenario are marked with different colors, see also Table \ref{['table:persona1']}. (C) The framework of generative AI voting. For each voter in the real-world voting scenario, a prompt is given to large language models to construct the voting persona. The input is the personal human traits, the voting options, and the ballot format, with instructions for the voting persona on how to make a choice. This choice is the output of the persona. Both human and AI choices are aggregated using a ballot aggregation method. The inconsistencies of individual and collective choices for humans and AI personas are assessed, along with potential biases that explain these inconsistencies. (D) The personal human traits are mapped to cognitive biases. Section \ref{['sec:bias']} illustrates the origin of choice inconsistencies to potential cognitive biases.
Figure 2: Choice by large language models is consistent to humans for single-choice majoritarian elections, however accuracy drops for more complex ballots with larger number of alternatives as in the case of participatory budgeting. Strikingly, accuracy of collective choice is significantly higher than individual choice, particularly for the fairer ballot aggregation rules of equal shares and Phragmén's. GPT 3.5 shows the highest consistency and Llama3-8B the lowest among the large language models, which though remain inferior to a predictive machine learning model. The mean consistency (y-axis) for different population of voters (10%, 25%, 75% and 100%) in individual and collective choice is shown for different AI models (x-axis), across three real-world voting scenarios: The participatory budgeting campaign of City Idea, (A) actual and (B) survey, as well as (C) the US national elections of 2012, 2016 and 2020. For participatory budgeting, the ballot formats of cumulative/score (left) and approval (right) are shown, including the ballot aggregation methods of equal shares, Phragmén's and utilitarian greedy. For the actual voting of City Idea, the accuracy of equal shares is calculated for all winners and a controlled number of winners (as many as greedy) for a fairer comparison.
Figure 3: Intransitivity of AI across different pairs of ballot formats is higher than the one of humans, which remains negligible. AI intransitivities have a higher influence on the consistency of voting outcomes over a large number of alternatives. Llama3-8B predicts ballots that are not very diverse and selects a limited set of projects, which results in higher transitivity compared to other language models. Equal shares and Phragmén's also show here higher capacity to mitigate the ballot intransitivities. It achieves more than 80% consistency between cumulative and approval ballots. The consistency (y-axis) in individual choice among different pairs of ballot formats (x-axis) is shown for different large language models, humans and the two voting scenarios in the participatory budgeting campaign of City Idea: (A) actual vs. (B) survey. Mean consistency values are calculated across randomly sampled population of 25%, 50%, and 75%.
Figure 4: Representing more than half of human abstaining voters with AI results in significant consistency recovery, in particular for fair ballot aggregation methods. Strikingly, AI representation of abstained voters is more effective than representing arbitrary voters (random control). Consistency recovery is at two levels: (i) False negative projects removed under abstaining but added back by AI representatives, which are are higher in ranking and number than (ii) false positive projects added under abstaining but removed by AI representatives. The consistency loss in voting outcomes by low voters turnout (x-axis) is emulated by removing different ratios of human voters (25%, 50%, 75% and 100%) among the whole population (baseline) and those who are likely to abstain: low engagement, trust and digital literacy profile (% of the abstaining populations in the brackets on top). A consistency recovery (y-axis) is hypothesized by AI representation using GPT 3.5. (A) Actual participatory budgeting campaign of City Idea. (B) Studied participation modalities. (C)-(D) The origin of consistency recovery in participatory budgeting for utilitarian greedy and equal shares respectively. Abstaining voters result in falsely removing (left) and falsely adding (right) winning projects. AI representatives add back and remove these projects respectively to recover consistency. The projects and their probability to recover consistency under random control are shown for comparison.
Figure 5: Compared to an arbitrary abstaining voter, those with low engagement and digital literacy exhibit characteristics that explain the consistency of human-AI representation and ballot formats, for instance, no interest in politics and support to education/health projects related to unconscious and surrogation biases. Time discounting, affect and conformity biases, such as preference for public space and environmental projects as well as support to families contribute to the consistency of human-AI choice. Time discounting factors such as preference for sport, and welfare projects as well as affect heuristics such as preference for projects that benefit families explain AI consistency among ballot formats. The relative importance of the personal human traits (y-axis) for the actual participatory budgeting campaign of City Idea over different large language models (x-axis) is depicted by the size of the bubbles and it is calculated using Shapley additive explanations. The consistency of human-AI representation (A) and ballot formats (B) (single choice vs. cumulative) is assessed. For each of these, the personal human traits explain the following: (i) The consistency difference between the three abstaining models and their random control. (ii) The (in)consistency of AI representation and transitivity for the whole population. The '-' sign indicates non-significant values (p>0.05). (C)-(D) The statistically significant biases present in all AI models and datasets are summarized by chord diagrams.

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

TL;DR

Abstract

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Authors

TL;DR

Abstract

Table of Contents

Figures (5)