Assessing Political Bias in Large Language Models
Luca Rettenberger, Markus Reischl, Mark Schutera
TL;DR
The paper investigates political bias in open-source Large Language Models by measuring their alignment with German party positions using the Wahl-O-Mat framework in the context of the 2024 European Parliament elections. It compares multiple models across German and English prompts, revealing language- and model-size dependent biases, with Llama3-70B showing strong left-leaning alignment in both languages and AfD alignment remaining consistently low. The results demonstrate that language input significantly shapes perceived bias, suggesting that model capacity and training data influence how political content is generated at scale. The work highlights the necessity of bias transparency, robust evaluation, and human-in-the-loop safeguards to protect democratic processes while enabling the constructive use of AI in political contexts.
Abstract
The assessment of bias within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) in the context of their potential impact on societal dynamics. Recognizing and considering political bias within LLM applications is especially important when closing in on the tipping point toward performative prediction. Then, being educated about potential effects and the societal behavior LLMs can drive at scale due to their interplay with human operators. In this way, the upcoming elections of the European Parliament will not remain unaffected by LLMs. We evaluate the political bias of the currently most popular open-source LLMs (instruct or assistant models) concerning political issues within the European Union (EU) from a German voter's perspective. To do so, we use the "Wahl-O-Mat," a voting advice application used in Germany. From the voting advice of the "Wahl-O-Mat" we quantize the degree of alignment of LLMs with German political parties. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties, while smaller models often remain neutral, particularly when prompted in English. The central finding is that LLMs are similarly biased, with low variances in the alignment concerning a specific party. Our findings underline the importance of rigorously assessing and making bias transparent in LLMs to safeguard the integrity and trustworthiness of applications that employ the capabilities of performative prediction and the invisible hand of machine learning prediction and language generation.
