A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany
Ina Dormuth, Sven Franke, Marlies Hafer, Tim Katzke, Alexander Marx, Emmanuel Müller, Daniel Neider, Markus Pauly, Jérôme Rutinowski
TL;DR
This work assesses the reliability of AI-based VAAs and LLMs for political information ahead of Germany's 2025 election by benchmarking against Wahl-O-Mat statements. It employs a stochastic, multi-model evaluation (ChatGPT 4o, DeepSeek V3, and DeepSeek R1) with five repetitions per prompt, using Retrieval-Augmented Generation and prompt variations to measure alignment with major parties. Key findings show a pronounced left-leaning bias in LLM responses (e.g., Greens and SPD around $79$–$86 ext{ extpercent}$) and substantial deviations/hallucinations in VAAs (Wahl.Chat deviates in $25 ext{ extpercent}$ of cases; WAHLWEISE in $54 ext{ extpercent}$). The results underscore critical risks of deploying LLM-based VAAs in electoral contexts, necessitating rigorous certification, scrutiny of prompt sensitivity, and mechanisms to ensure factual alignment with party positions.
Abstract
In this study, we examine the reliability of AI-based Voting Advice Applications (VAAs) and large language models (LLMs) in providing objective political information. Our analysis is based upon a comparison with party responses to 38 statements of the Wahl-O-Mat, a well-established German online tool that helps inform voters by comparing their views with political party positions. For the LLMs, we identify significant biases. They exhibit a strong alignment (over 75% on average) with left-wing parties and a substantially lower alignment with center-right (smaller 50%) and right-wing parties (around 30%). Furthermore, for the VAAs, intended to objectively inform voters, we found substantial deviations from the parties' stated positions in Wahl-O-Mat: While one VAA deviated in 25% of cases, another VAA showed deviations in more than 50% of cases. For the latter, we even observed that simple prompt injections led to severe hallucinations, including false claims such as non-existent connections between political parties and right-wing extremist ties.
