Mitigating Selection Bias with Node Pruning and Auxiliary Options
Hyeong Kyu Choi, Weijie Xu, Chi Xue, Stephanie Eckman, Chandan K. Reddy
TL;DR
This work tackles selection bias in MCQ answering by LLMs, a problem that undermines accuracy and reliability in decision-critical tasks. It introduces BNP, a parameter pruning method that removes bias-interacting rows in the final output projection, and AOI, a prompting strategy that adds an explicit 'I don't know' option to reduce bias even for black-box models; together, they yield improved accuracy and reduced bias across several LLMs and datasets. The authors also propose CKLD, a distribution-based metric that captures how closely the predicted answer distribution matches the ground-truth distribution, complementing existing bias measures like RSD. Results show BNP+AOI consistently improve performance and bias metrics, with notable gains (e.g., up to 24.9% accuracy improvement on ARC-Challenge for Llama-3) and strong generalization across white-box and black-box settings, suggesting practical applicability to diverse MCQ tasks and prompting paradigms.
Abstract
Large language models (LLMs) often exhibit systematic preferences for certain answer choices when responding to multiple-choice questions-a behavior known as selection bias. This bias reduces the accuracy and reliability of LLM outputs, limiting their usefulness in decision-critical applications. While prior work has focused on adjusting model inputs or outputs to mitigate this issue, our work takes a fundamentally different approach by identifying and removing the internal sources of bias. We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an additional answer choice to reduce bias in both white-box and black-box settings. To address the shortcomings of existing evaluation metrics, we propose Choice Kullback-Leibler Divergence (CKLD), a new metric that captures distributional imbalances in model predictions. Experiments on three LLMs across multiple datasets demonstrate that our methods consistently improve answer accuracy while reducing selection bias, providing a robust solution for both open- and closed-source models.
