Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions
Hanyang Zhong, Liman Wang, Wenting Cao, Zeyuan Sun
TL;DR
This work investigates cognitive biases in LLM MCQ decision-making, arguing that rational deviations can enhance efficiency when properly moderated. It introduces abstention and heuristic moderation, evaluated on the BRU dataset across GPT-4, Gemini 1.0 Pro, and LLaMA3-70B, with a Bias Detection Loop that transitions from general to specific bias inspection. Key findings show that SBI combined with abstention yields the highest accuracy and lowest error, while scaling bias inspection and dynamic bias detection improve reliability and human-aligned reasoning. The results suggest a practical framework for deploying LLMs in decision-support tasks where uncertainty and bias risk are nontrivial, enabling more trusted and efficient AI reasoning.
Abstract
This paper examines the role of cognitive biases in the decision-making processes of large language models (LLMs), challenging the conventional goal of eliminating all biases. When properly balanced, we show that certain cognitive biases can enhance decision-making efficiency through rational deviations and heuristic shortcuts. By introducing heuristic moderation and an abstention option, which allows LLMs to withhold responses when uncertain, we reduce error rates, improve decision accuracy, and optimize decision rates. Using the Balance Rigor and Utility (BRU) dataset, developed through expert collaboration, our findings demonstrate that targeted inspection of cognitive biases aligns LLM decisions more closely with human reasoning, enhancing reliability and suggesting strategies for future improvements. This approach offers a novel way to leverage cognitive biases to improve the practical utility of LLMs across various applications.
