Table of Contents
Fetching ...

Privacy-Aware Predictions in Participatory Budgeting

Juan Zambrano, Clément Contet, Jairo Gudiño-Rosero, Felipe Garrido-Lucero, Umberto Grandi, César Hidalgo

TL;DR

Privacy-Aware Predictions in Participatory Budgeting tackles predicting voter support for proposals at the submission stage using only textual descriptions and anonymous historical voting records, avoiding voter demographics. The authors construct two multi-year city datasets (Toulouse and Wroclaw) and compare classical predictors (ElasticNet, XGBoost) with LLM-based prompting, including a retrieval-augmented generation (RAG) setup. They show that LLMs can meaningfully predict proposal rankings, with ranking correlations improving notably under RAG and when text content is utilized, while No-Text baselines perform worse. The approach offers a practical, privacy-preserving tool for PB organizers to manage large proposal volumes, though limitations such as access to commercial LLMs, reliance on RAG, and ethical considerations around algorithmic influence warrant careful deployment and further study.

Abstract

Participatory budgeting is a democratic innovation that empowers citizens to propose and vote on public investment projects. While researchers in computer science focused on improving the voting phase of this process, in this work we aim to support organizers of participatory budgeting campaigns to manage large volumes of project proposals at the submission stage. We propose a privacy-preserving approach to predict which proposals are likely to be funded, using only projects' textual descriptions and anonymous historical voting records, without relying on voter demographics or personally identifiable information.

Privacy-Aware Predictions in Participatory Budgeting

TL;DR

Privacy-Aware Predictions in Participatory Budgeting tackles predicting voter support for proposals at the submission stage using only textual descriptions and anonymous historical voting records, avoiding voter demographics. The authors construct two multi-year city datasets (Toulouse and Wroclaw) and compare classical predictors (ElasticNet, XGBoost) with LLM-based prompting, including a retrieval-augmented generation (RAG) setup. They show that LLMs can meaningfully predict proposal rankings, with ranking correlations improving notably under RAG and when text content is utilized, while No-Text baselines perform worse. The approach offers a practical, privacy-preserving tool for PB organizers to manage large proposal volumes, though limitations such as access to commercial LLMs, reliance on RAG, and ethical considerations around algorithmic influence warrant careful deployment and further study.

Abstract

Participatory budgeting is a democratic innovation that empowers citizens to propose and vote on public investment projects. While researchers in computer science focused on improving the voting phase of this process, in this work we aim to support organizers of participatory budgeting campaigns to manage large volumes of project proposals at the submission stage. We propose a privacy-preserving approach to predict which proposals are likely to be funded, using only projects' textual descriptions and anonymous historical voting records, without relying on voter demographics or personally identifiable information.

Paper Structure

This paper contains 20 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Prompt for the Wroclaw dataset. The context from previous election box represents the context given by the three prompt variants considered (zero-shot, full context, RAG).
  • Figure 2: General workflow of our methodology to validate predictions of voting outcomes in PB elections. Projects and vote counts from a PB election are given as training data, with the objective of predicting the vote count of projects in the following election.
  • Figure 3: Predicted vs. real vote counts, normalized by number of voters for Llama 3.3 70B on the Toulouse dataset. The figures compare zero-shot (top) and RAG (bottom) prompts. The dashed line is the identity line ($y=x$), indicating perfect prediction.
  • Figure 4: Jaccard Index of the top-$k$ most-voted projects ($2\% \leq k \leq 30\%$) for Toulouse (top) and Wroclºaw (bottom). The plots compare two ML baselines (Elastic Net, XGBoost) and RAG prompts for (Llama 3.3 70b, GPT-4 Turbo).
  • Figure 5: Aggregated cost of the top-$k$ most-voted projects for $0\% \leq k \leq 100\%$.
  • ...and 3 more figures