Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration

Ang Li; Jingqian Zhao; Bin Liang; Lin Gui; Hui Wang; Xi Zeng; Xingwei Liang; Kam-Fai Wong; Ruifeng Xu

Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration

Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu

TL;DR

The paper tackles biases in large language models (LLMs) for stance detection, identifying two key bias types: sentiment-stance spurious correlations (SSC) and target preference bias (TPB). It introduces FACTUAL, a Counterfactual Augmented Calibration Network that uses a calibration module and counterfactual augmented data to debias predictions and improve out-of-domain generalization, formalizing a joint objective that combines standard supervision with non-causal and causal counterfactual losses. Empirical results on SemEval-2016, P-Stance, and VAST show state-of-the-art performance in both in-target and zero-shot stance detection, accompanied by substantial reductions in SSC and TPB biases, as measured by the proposed Bias-SSC and Bias-TPB metrics. The approach demonstrates robust debiasing and improved generalization, offering a practical pathway to safer and more reliable stance analysis with LLMs in real-world applications.

Abstract

Stance detection is critical for understanding the underlying position or attitude expressed toward a topic. Large language models (LLMs) have demonstrated significant advancements across various natural language processing tasks including stance detection, however, their performance in stance detection is limited by biases and spurious correlations inherent due to their data-driven nature. Our statistical experiment reveals that LLMs are prone to generate biased stances due to sentiment-stance spurious correlations and preference towards certain individuals and topics. Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs. Further, to address the challenge of effectively learning bias representations and the difficulty in the generalizability of debiasing, we construct counterfactual augmented data. This approach enhances the calibration network, facilitating the debiasing and out-of-domain generalization. Experimental results on in-target and zero-shot stance detection tasks show that the proposed FACTUAL can effectively mitigate biases of LLMs, achieving state-of-the-art results.

Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration

TL;DR

Abstract

Paper Structure (31 sections, 11 equations, 11 figures, 21 tables)

This paper contains 31 sections, 11 equations, 11 figures, 21 tables.

Introduction
Related Work
Biases in Large Language Models
Mitigating Biases in Stance Detection
Biases of LLMs in Stance Detection
Bias Measurement
Experimental Result
Sentiment-Stance Spurious Correlations
Target Preference Bias
Mitigating Bias with Calibration
Calibration Network
Counterfactual Data Augmentation
Training Objective
Experimental Setup
Datasets
...and 16 more sections

Figures (11)

Figure 1: An example demonstrates two types of biases encountered by large language models in stance detection tasks (shown at the top and bottom) as well as unbiased stance rationale (shown in the middle).
Figure 2: The recall score of each stance label on three sentiment subsets, normalizing by subtracting the overall recall scores of the corresponding stance labels across overall dataset, on Sem16, P-Stance, and VAST. POS for positive, NEU for neutral, NEG for negative.
Figure 3: The recall score of each stance label on several target subsets, normalizing by subtracting the overall recall score of the corresponding stance labels across all targets, on Sem16, P-Stance, and VAST dataset. HC for Hillary Clinton, LA for Legalization of Abortion, AT for Atheism, JB for Joe Biden, BS for Bernie Sanders, DT for Donald Trump, CH for Christian, CL for Election, HP for Humanity Program.
Figure 4: The overall architecture of our proposed FACTUAL. (a) and (b) in the counterfactual data generation represent two ways to generate counterfactual augmentation. $X$ donates the text, $T$ donates the target, $H$ donates the features of the interaction of text and target, and $Y$ donates the stance label. $C$ represents confounding factors, which arise from the two types of biases previously analyzed and may distort the stance prediction. $*$ denotes the perturbation of non-causal features, and $\sim$ denotes the perturbation of causal features.
Figure 5: Prompt template of sentiment labels annotation by GPT-4. Fill the blue text with the corresponding text from the sample.
...and 6 more figures

Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration

TL;DR

Abstract

Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration

Authors

TL;DR

Abstract

Table of Contents

Figures (11)