Table of Contents
Fetching ...

Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models

Maximilian Kreutner, Marlene Lutz, Markus Strohmaier

TL;DR

This work analyzes whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies and evaluates whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods.

Abstract

Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse but have been found to consistently exhibit a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups with which the base model is not aligned. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies. We evaluate whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods. Finally, we find that we can simulate the voting behavior of Members of the European Parliament reasonably well, achieving a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at the following url: https://github.com/dess-mannheim/european_parliament_simulation.

Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models

TL;DR

This work analyzes whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies and evaluates whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods.

Abstract

Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse but have been found to consistently exhibit a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups with which the base model is not aligned. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict the positions of European groups on a diverse set of policies. We evaluate whether predictions are stable in response to counterfactual arguments, different persona prompts, and generation methods. Finally, we find that we can simulate the voting behavior of Members of the European Parliament reasonably well, achieving a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at the following url: https://github.com/dess-mannheim/european_parliament_simulation.

Paper Structure

This paper contains 33 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Simulating the EU parliament. We prompt the LLM to adopt the identity of each member of the European parliament. After providing the LLM with information about a proposal, we ask it to cast a vote in favor (FOR) or against (AGAINST) a proposal, or to abstain (ABSTENTION) from voting. We find that by using persona prompting, we can approximate the individual voting behavior of the members of the EU parliament, achieving a weighted F1 score of 0.80.
  • Figure 2: Distribution of predicted votes per European group. We display the voting predictions of the best performing model (Llama3-70B with attribute prompting and reasoning) compared to the ground truth. The weighted F1 score is displayed above each group. The model predicts the votes of center-left and progressive groups (S&D, Renew, Greens/EFA) the best and performs worst for groups at the edge of the political spectrum (ID, GUE/NGL, ECR). Notably, the model almost never predicts abstentions.
  • Figure 3: Influence of counterfactual speeches on vote prediction across groups. We compare the voting behavior of Llama3-70B when prompting with counterfactual speeches as opposed to the original speeches. Personas based on MEPs affiliated with groups at the edges of the political spectrum (GUE/NGL, ID, ECR) tend to change their votes more frequently.
  • Figure 4: Influence of the persona description on vote prediction across groups. We illustrate the vote distribution when prompting Llama3-8B with only the name vs prompting with all persona attributes. Providing all attributes of a persona significantly changes the voting behavior of the model compared to only providing the name.
  • Figure 5: Distribution of predicted votes per European group for Qwen-72B. We display the voting predictions of the best performing Qwen approach (Qwen-72B with Attribute prompting and reasoning) compared to the ground truth. The weighted F1 score is displayed above each group. The model predicts the votes of center-left and progressive groups (S&D, Renew, Greens/EFA) the best and performs worst for groups at the edge of the political spectrum (ID, GUE/NGL, ECR). Compared to LLama3-70B, the model does predict more ABSTENTIONS, however only seldom correctly (Precision 0.129 and Recall 0.095).
  • ...and 4 more figures