Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

Lynnette Hui Xian Ng; Iain Cruickshank; Roy Ka-Wei Lee

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

Lynnette Hui Xian Ng, Iain Cruickshank, Roy Ka-Wei Lee

TL;DR

This study investigates how political biases in large language models affect stance classification across three datasets (BASIL, SemEval2016, Elections2016) and seven LLMs using four prompting schemes. It finds statistically significant left-vs-right performance differences at the dataset level ($p<0.01$) and shows that increasing target ambiguity generally reduces stance classification accuracy, while prompting schemes have mixed effects. Notably, prompting the model to reason about political bias (bias CoT) often does not mitigate bias and can amplify it, suggesting data-driven origins of bias. The work highlights dataset instability and the strong influence of data and annotation quality on downstream political-stance tasks, calling for robust data design and prompting strategies in real-world applications.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in executing tasks based on natural language queries. However, these models, trained on curated datasets, inherently embody biases ranging from racial to national and gender biases. It remains uncertain whether these biases impact the performance of LLMs for certain tasks. In this study, we investigate the political biases of LLMs within the stance classification task, specifically examining whether these models exhibit a tendency to more accurately classify politically-charged stances. Utilizing three datasets, seven LLMs, and four distinct prompting schemes, we analyze the performance of LLMs on politically oriented statements and targets. Our findings reveal a statistically significant difference in the performance of LLMs across various politically oriented stance classification tasks. Furthermore, we observe that this difference primarily manifests at the dataset level, with models and prompting schemes showing statistically similar performances across different stance classification datasets. Lastly, we observe that when there is greater ambiguity in the target the statement is directed towards, LLMs have poorer stance classification accuracy. Code & Dataset: http://doi.org/10.5281/zenodo.12938478

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

TL;DR

) and shows that increasing target ambiguity generally reduces stance classification accuracy, while prompting schemes have mixed effects. Notably, prompting the model to reason about political bias (bias CoT) often does not mitigate bias and can amplify it, suggesting data-driven origins of bias. The work highlights dataset instability and the strong influence of data and annotation quality on downstream political-stance tasks, calling for robust data design and prompting strategies in real-world applications.

Abstract

Paper Structure (21 sections, 2 figures, 12 tables)

This paper contains 21 sections, 2 figures, 12 tables.

Introduction
Related Work
Methodology
Datasets
Classifying Political Orientation
Experiments
Prompting Schemes
LLM Setups
Evaluation
Measurement of task accuracy
Characterization of political orientations of stances
Ablation Studies with Target Alterations
Results
Consistency of results through different target alterations
Illustrative Examples
...and 6 more sections

Figures (2)

Figure 1: Density plots for differences in performance of LLMs between left and right-leaning stances in the stance detection task. Negative values indicate better performance at left-leaning stances and positive values indicate better performance on right-leaning stances.
Figure 2: P-values of pairwise Wilcoxon Signed-Rank Test comparing accuracy scores of target alteration types.

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

TL;DR

Abstract

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (2)