Table of Contents
Fetching ...

More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering

Duc Anh Vu, Thong Nguyen, Cong-Duy Nguyen, Viet Anh Nguyen, Anh Tuan Luu

TL;DR

BiasPrompting addresses bias and contextual grounding gaps in MCQ answering by forcing LLMs to generate and compare reasoning for all answer options before deciding. The method comprises two stages: Reasoning Generation and Reasonings-Guided Agreement, enabling comprehensive exploration of options and more informed final predictions. Empirical results across five MCQ benchmarks with open-source 7B LLMs show consistent accuracy gains, greater stability than Chain-of-Thought prompting, and reduced token usage, while mitigating option-order biases. The work highlights latent reasoning capabilities within a single LLM and suggests BiasPrompting as a practical, efficient approach to enhancing MCQ reasoning and potentially broader reasoning tasks.

Abstract

With the advancement of large language models (LLMs), their performance on multiple-choice question (MCQ) tasks has improved significantly. However, existing approaches face key limitations: answer choices are typically presented to LLMs without contextual grounding or explanation. This absence of context can lead to incomplete exploration of all possible answers, ultimately degrading the models' reasoning capabilities. To address these challenges, we introduce BiasPrompting, a novel inference framework that guides LLMs to generate and critically evaluate reasoning across all plausible answer options before reaching a final prediction. It consists of two components: first, a reasoning generation stage, where the model is prompted to produce supportive reasonings for each answer option, and then, a reasoning-guided agreement stage, where the generated reasonings are synthesized to select the most plausible answer. Through comprehensive evaluations, BiasPrompting demonstrates significant improvements in five widely used multiple-choice question answering benchmarks. Our experiments showcase that BiasPrompting enhances the reasoning capabilities of LLMs and provides a strong foundation for tackling complex and challenging questions, particularly in settings where existing methods underperform.

More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering

TL;DR

BiasPrompting addresses bias and contextual grounding gaps in MCQ answering by forcing LLMs to generate and compare reasoning for all answer options before deciding. The method comprises two stages: Reasoning Generation and Reasonings-Guided Agreement, enabling comprehensive exploration of options and more informed final predictions. Empirical results across five MCQ benchmarks with open-source 7B LLMs show consistent accuracy gains, greater stability than Chain-of-Thought prompting, and reduced token usage, while mitigating option-order biases. The work highlights latent reasoning capabilities within a single LLM and suggests BiasPrompting as a practical, efficient approach to enhancing MCQ reasoning and potentially broader reasoning tasks.

Abstract

With the advancement of large language models (LLMs), their performance on multiple-choice question (MCQ) tasks has improved significantly. However, existing approaches face key limitations: answer choices are typically presented to LLMs without contextual grounding or explanation. This absence of context can lead to incomplete exploration of all possible answers, ultimately degrading the models' reasoning capabilities. To address these challenges, we introduce BiasPrompting, a novel inference framework that guides LLMs to generate and critically evaluate reasoning across all plausible answer options before reaching a final prediction. It consists of two components: first, a reasoning generation stage, where the model is prompted to produce supportive reasonings for each answer option, and then, a reasoning-guided agreement stage, where the generated reasonings are synthesized to select the most plausible answer. Through comprehensive evaluations, BiasPrompting demonstrates significant improvements in five widely used multiple-choice question answering benchmarks. Our experiments showcase that BiasPrompting enhances the reasoning capabilities of LLMs and provides a strong foundation for tackling complex and challenging questions, particularly in settings where existing methods underperform.

Paper Structure

This paper contains 19 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of BiasPrompting with two steps of Reasoning Generation and Reasonings-Guided Agreement.
  • Figure 2: Number of questions successfully answered by each prompting method while the other two underperform.
  • Figure 3: Violin plots comparing the average generated tokens from BiasPrompting and CoT prompting across five datasets (CSQA, SQA, PIQA, DU, and CJ) using the Gemma model. The $\blacksquare$ violins represent BiasPrompting, while the $\blacksquare$ violins correspond to CoT prompting. The dashed line inside each violin indicates the mean generated tokens.
  • Figure 4: Results of option order swapping on zero-shot and BiasPrompting performance across random option orderings. $\blacksquare$ denotes the box-plot of BiasPrompting, while $\blacksquare$ represents the box-plot of zero-shot.