Table of Contents
Fetching ...

Analyzing Political Bias in LLMs via Target-Oriented Sentiment Classification

Akram Elbouanani, Evan Dufraisse, Adrian Popescu

TL;DR

The paper develops a scalable framework to analyze political bias in large language models by treating target-oriented sentiment classification as the testbed and measuring prediction inconsistency across entity substitutions with an entropy-based metric. It constructs a large, multilingual dataset (450 sentences, 1319 politicians across six languages) and evaluates seven models to reveal systematic biases that favor left/center-left alignments and disfavor right/far-right alignments, with bias intensity increasing for larger models and varying across languages. A key contribution is showing that LLMs encode internal representations of entities and exhibit cross-language bias patterns, along with a mitigation strategy—replacing politician names with fictional counterparts—that reduces inconsistencies and improves robustness. The work highlights implications for deploying LLMs in socially sensitive tasks and provides a practical, model-agnostic approach for bias assessment and mitigation, while acknowledging limitations in data representativeness and evolving political contexts.

Abstract

Political biases encoded by LLMs might have detrimental effects on downstream applications. Existing bias analysis methods rely on small-size intermediate tasks (questionnaire answering or political content generation) and rely on the LLMs themselves for analysis, thus propagating bias. We propose a new approach leveraging the observation that LLM sentiment predictions vary with the target entity in the same sentence. We define an entropy-based inconsistency metric to encode this prediction variability. We insert 1319 demographically and politically diverse politician names in 450 political sentences and predict target-oriented sentiment using seven models in six widely spoken languages. We observe inconsistencies in all tested combinations and aggregate them in a statistically robust analysis at different granularity levels. We observe positive and negative bias toward left and far-right politicians and positive correlations between politicians with similar alignment. Bias intensity is higher for Western languages than for others. Larger models exhibit stronger and more consistent biases and reduce discrepancies between similar languages. We partially mitigate LLM unreliability in target-oriented sentiment classification (TSC) by replacing politician names with fictional but plausible counterparts.

Analyzing Political Bias in LLMs via Target-Oriented Sentiment Classification

TL;DR

The paper develops a scalable framework to analyze political bias in large language models by treating target-oriented sentiment classification as the testbed and measuring prediction inconsistency across entity substitutions with an entropy-based metric. It constructs a large, multilingual dataset (450 sentences, 1319 politicians across six languages) and evaluates seven models to reveal systematic biases that favor left/center-left alignments and disfavor right/far-right alignments, with bias intensity increasing for larger models and varying across languages. A key contribution is showing that LLMs encode internal representations of entities and exhibit cross-language bias patterns, along with a mitigation strategy—replacing politician names with fictional counterparts—that reduces inconsistencies and improves robustness. The work highlights implications for deploying LLMs in socially sensitive tasks and provides a practical, model-agnostic approach for bias assessment and mitigation, while acknowledging limitations in data representativeness and evolving political contexts.

Abstract

Political biases encoded by LLMs might have detrimental effects on downstream applications. Existing bias analysis methods rely on small-size intermediate tasks (questionnaire answering or political content generation) and rely on the LLMs themselves for analysis, thus propagating bias. We propose a new approach leveraging the observation that LLM sentiment predictions vary with the target entity in the same sentence. We define an entropy-based inconsistency metric to encode this prediction variability. We insert 1319 demographically and politically diverse politician names in 450 political sentences and predict target-oriented sentiment using seven models in six widely spoken languages. We observe inconsistencies in all tested combinations and aggregate them in a statistically robust analysis at different granularity levels. We observe positive and negative bias toward left and far-right politicians and positive correlations between politicians with similar alignment. Bias intensity is higher for Western languages than for others. Larger models exhibit stronger and more consistent biases and reduce discrepancies between similar languages. We partially mitigate LLM unreliability in target-oriented sentiment classification (TSC) by replacing politician names with fictional but plausible counterparts.

Paper Structure

This paper contains 47 sections, 5 equations, 13 figures, 12 tables, 2 algorithms.

Figures (13)

  • Figure 1: Sentiment-prediction based analysis of LLM model--language combinations when varying names in political sentences before and after politician name replacement with fictional but plausible names (no vs. 0.7 transparency). The desired behavior combines high accuracy, reflecting a correct understanding of the sentiment associated with names, and low inconsistency (Eq. \ref{['eq_consistency']}), reflecting a lack of bias toward the analyzed entities. The comparison highlights the entity-related bias encoded in LLMs and the effectiveness of the name replacement mitigation approach.
  • Figure 2: Average sentiment scores for languages (aggregated across all models) and for models (aggregated across all languages) per alignment. For each language or model, the averages are centered around the mean of the sentiments. Shaded areas represent 95% confidence intervals. The results indicate a consistent positive bias for CC, CL, and LL politicians, and a negative bias for RR and FR politicians across all languages and models. English, French, and Spanish exhibit stronger biases than Arabic, Chinese, and Russian. Additionally, larger models tend to demonstrate higher biases than smaller models.
  • Figure 3: Boxplots depicting average sentiment scores for entities across political alignments in English and Arabic. The results reveal a pronounced bias in English, particularly a strong negative bias for far-right figures and a positive one for left figures. In contrast, biases in Arabic are less discernible, except for Aya-Expanse-32B, a model trained for multilingual tasks, which exhibits more apparent biases in Arabic as well—showing positive sentiment toward LL and CL (center-left) figures and negative sentiment toward FR figures.
  • Figure 4: Jaccard similarity index between the sentiment predictions in the tested languages obtained with Qwen-7B, and Qwen-72B , and Aya-Expanse-32B.
  • Figure 5: Political compasses showing sentiment bias for GPT-4o-mini in English (left) and Chinese (right). The y-axis represents the social policy spectrum (0: socially progressive, 10: socially conservative), and the x-axis represents the economic policy spectrum (0: fiscally progressive, 10: fiscally conservative). Parties are positioned using ParlGov data, with colors indicating the average sentiment of affiliated politicians (red: negative, green: positive). Blank squares denote no corresponding party. Left-libertarian parties consistently receive positive sentiment, while right-authoritarian parties show negative sentiment, highlighting consistent ideological biases across languages.
  • ...and 8 more figures