Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu
TL;DR
The paper tackles biases in large language models (LLMs) for stance detection, identifying two key bias types: sentiment-stance spurious correlations (SSC) and target preference bias (TPB). It introduces FACTUAL, a Counterfactual Augmented Calibration Network that uses a calibration module and counterfactual augmented data to debias predictions and improve out-of-domain generalization, formalizing a joint objective that combines standard supervision with non-causal and causal counterfactual losses. Empirical results on SemEval-2016, P-Stance, and VAST show state-of-the-art performance in both in-target and zero-shot stance detection, accompanied by substantial reductions in SSC and TPB biases, as measured by the proposed Bias-SSC and Bias-TPB metrics. The approach demonstrates robust debiasing and improved generalization, offering a practical pathway to safer and more reliable stance analysis with LLMs in real-world applications.
Abstract
Stance detection is critical for understanding the underlying position or attitude expressed toward a topic. Large language models (LLMs) have demonstrated significant advancements across various natural language processing tasks including stance detection, however, their performance in stance detection is limited by biases and spurious correlations inherent due to their data-driven nature. Our statistical experiment reveals that LLMs are prone to generate biased stances due to sentiment-stance spurious correlations and preference towards certain individuals and topics. Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs. Further, to address the challenge of effectively learning bias representations and the difficulty in the generalizability of debiasing, we construct counterfactual augmented data. This approach enhances the calibration network, facilitating the debiasing and out-of-domain generalization. Experimental results on in-target and zero-shot stance detection tasks show that the proposed FACTUAL can effectively mitigate biases of LLMs, achieving state-of-the-art results.
