Table of Contents
Fetching ...

ParlAI Vote: A Web Platform for Analyzing Gender and Political Bias in Large Language Models

Wenjie Lin, Hange Liu, Yingying Zhuang, Xutao Mao, Jingwei Shi, Xudong Han, Tianyu Shi, Jinrui Yang

TL;DR

ParlAI Vote tackles the need for transparent, demographic-aware evaluation of language models in political contexts by uniting debate content, roll-call votes, and demographic data in an interactive web platform. It enables experiments with frontier and open-source LLMs to predict votes, assess gender and political biases, and inspect model reasoning and counterfactual scenarios. The work contributes the first unified AI-powered interface for linking issue speeches to outcomes, along with demonstrations of bias patterns and explanations. The platform supports reproducible analysis, education, and public engagement, highlighting limitations of current LLMs in multilingual, politically sensitive settings.

Abstract

We present ParlAI Vote, an interactive web platform for exploring European Parliament debates and votes, and for testing LLMs on vote prediction and bias analysis. This web system connects debate topics, speeches, and roll-call outcomes, and includes rich demographic data such as gender, age, country, and political group. Users can browse debates, inspect linked speeches, compare real voting outcomes with predictions from frontier LLMs, and view error breakdowns by demographic group. Visualizing the EuroParlVote benchmark and its core tasks of gender classification and vote prediction, ParlAI Vote highlights systematic performance bias in state-of-the-art LLMs. It unifies data, models, and visual analytics in a single interface, lowering the barrier for reproducing findings, auditing behavior, and running counterfactual scenarios. This web platform also shows model reasoning, helping users see why errors occur and what cues the models rely on. It supports research, education, and public engagement with legislative decision-making, while making clear both the strengths and the limitations of current LLMs in political analysis.

ParlAI Vote: A Web Platform for Analyzing Gender and Political Bias in Large Language Models

TL;DR

ParlAI Vote tackles the need for transparent, demographic-aware evaluation of language models in political contexts by uniting debate content, roll-call votes, and demographic data in an interactive web platform. It enables experiments with frontier and open-source LLMs to predict votes, assess gender and political biases, and inspect model reasoning and counterfactual scenarios. The work contributes the first unified AI-powered interface for linking issue speeches to outcomes, along with demonstrations of bias patterns and explanations. The platform supports reproducible analysis, education, and public engagement, highlighting limitations of current LLMs in multilingual, politically sensitive settings.

Abstract

We present ParlAI Vote, an interactive web platform for exploring European Parliament debates and votes, and for testing LLMs on vote prediction and bias analysis. This web system connects debate topics, speeches, and roll-call outcomes, and includes rich demographic data such as gender, age, country, and political group. Users can browse debates, inspect linked speeches, compare real voting outcomes with predictions from frontier LLMs, and view error breakdowns by demographic group. Visualizing the EuroParlVote benchmark and its core tasks of gender classification and vote prediction, ParlAI Vote highlights systematic performance bias in state-of-the-art LLMs. It unifies data, models, and visual analytics in a single interface, lowering the barrier for reproducing findings, auditing behavior, and running counterfactual scenarios. This web platform also shows model reasoning, helping users see why errors occur and what cues the models rely on. It supports research, education, and public engagement with legislative decision-making, while making clear both the strengths and the limitations of current LLMs in political analysis.

Paper Structure

This paper contains 7 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Vote Prediction results on Ecodesign Regulation by Marc Angel when providing Topic and Speech referred as context and using Llama-3.2 as LLM predictor.
  • Figure 2: Failure categories across LLMs. The x-axis lists the three error types, and the y-axis shows the percentage of errors that fall into each category. Note the sum is not equal to 100% since there are still other error reasons.