Table of Contents
Fetching ...

DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs

Zijie Meng, Yan Zhang, Zhaopeng Feng, Zuozhu Liu

TL;DR

This work tackles the uniform-processing bias in LLM-based MCQ reasoning by introducing Divide and Conquer Reasoning (DCR), which partitions questions according to a confidence score $\mathcal{CS}$ estimated from multiple Zero-Shot-CoT runs and then applies Filter Choices based Reasoning (FCR) to the low-$\mathcal{CS}$ subset. The method reduces inference cost to about 85% of the state of the art while delivering a solid average improvement of $1.56\%$ across nine diverse datasets spanning arithmetic, commonsense, and logic tasks, and proves effective across multiple LLMs. Key contributions include pioneering dataset-level division for LLM reasoning, demonstrating cost-accuracy tradeoffs through $\mathcal{CS}$-based partitioning and FCR, and validating generalization to cloze-style data like GSM8K. The results highlight practical gains in efficiency and accuracy, with insights into the relationship between confidence, distractors, and optimal allocation of reasoning resources.

Abstract

Large language models (LLMs) have shown impressive performance in reasoning benchmarks with the emergence of Chain-of-Thought (CoT), particularly in multi-choice question (MCQ). However, current works equally resolve questions regardless of the problem-solving difficulty, leading to an excessive focus on simple items while insufficient attention on intricate ones. To address this challenge, we propose a simple yet effective strategy, Divide and Conquer Reasoning (DCR), to enhance the reasoning capability of LLMs for MCQs, as inspired by human beings using heuristics to first categorize tasks and then handle them separately. In particular, we first categorize questions into two subsets based on confidence score ($\mathcal{CS}$), which is estimated by statistical frequency of generated answers. Subsequently, we propose Filter Choices based Reasoning (FCR) to improve model performance on MCQs with low ($\mathcal{CS}$). Our experiments demonstrate that the proposed strategy only costs 85% of SOTA, while still achieves average accuracy improvement of 1.56% across nine datasets including arithmetic, commonsense, and logic reasoning tasks. The code is at \url{https://github.com/AiMijie/Divide-and-Conquer}

DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs

TL;DR

This work tackles the uniform-processing bias in LLM-based MCQ reasoning by introducing Divide and Conquer Reasoning (DCR), which partitions questions according to a confidence score estimated from multiple Zero-Shot-CoT runs and then applies Filter Choices based Reasoning (FCR) to the low- subset. The method reduces inference cost to about 85% of the state of the art while delivering a solid average improvement of across nine diverse datasets spanning arithmetic, commonsense, and logic tasks, and proves effective across multiple LLMs. Key contributions include pioneering dataset-level division for LLM reasoning, demonstrating cost-accuracy tradeoffs through -based partitioning and FCR, and validating generalization to cloze-style data like GSM8K. The results highlight practical gains in efficiency and accuracy, with insights into the relationship between confidence, distractors, and optimal allocation of reasoning resources.

Abstract

Large language models (LLMs) have shown impressive performance in reasoning benchmarks with the emergence of Chain-of-Thought (CoT), particularly in multi-choice question (MCQ). However, current works equally resolve questions regardless of the problem-solving difficulty, leading to an excessive focus on simple items while insufficient attention on intricate ones. To address this challenge, we propose a simple yet effective strategy, Divide and Conquer Reasoning (DCR), to enhance the reasoning capability of LLMs for MCQs, as inspired by human beings using heuristics to first categorize tasks and then handle them separately. In particular, we first categorize questions into two subsets based on confidence score (), which is estimated by statistical frequency of generated answers. Subsequently, we propose Filter Choices based Reasoning (FCR) to improve model performance on MCQs with low (). Our experiments demonstrate that the proposed strategy only costs 85% of SOTA, while still achieves average accuracy improvement of 1.56% across nine datasets including arithmetic, commonsense, and logic reasoning tasks. The code is at \url{https://github.com/AiMijie/Divide-and-Conquer}
Paper Structure (18 sections, 3 equations, 12 figures, 10 tables)

This paper contains 18 sections, 3 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Illustration of DCR. (1) Divide. We first conduct $t$ (e.g. $t$=5) times inference with Zero-Shot-CoT zeroShotCOT by "Let's think step by step.". Then, the dataset $\mathbb{D}$ is divided based on $\mathcal{CS}$, where DataItems with $\mathcal{CS}$ less than $\mu$ (e.g. $\mu$=0.6) are categorized as $\mathbb{D}_{low}$, and the rest as $\mathbb{D}_{other}$. (2) Conquer. We fix $\mathbb{D}_{other}$ and propose FCR to process $\mathbb{D}_{low}$. "DataItem" in Divide area includes question text and full choices list, while involves only filtered choices list in Conquer area. "Rationale$i$_$j$" denotes the rationale generated by $j$-th LLM query for $i$-th DataItem. "Choice_$x$" represents the $x$-th option in original DataItem.
  • Figure 2: Average accuracy and #Call across different datasets. See Figure \ref{['fig:trend_ac_cost_detail']} for details about each dataset.
  • Figure 3: Average Prior accuracy on different subsets for various sample size $t$. See Figure \ref{['fig:scNumDivide_detail_acc']} for details about each dataset.
  • Figure 4: The average number of different subsets size for various sample size $t$. See Figure \ref{['fig:scNumDivide_detail_size']} for details about each dataset.
  • Figure 5: Distribution of different subsets among three tasks. See Figure \ref{['fig:prosubsetDiffDataset']} for details about each dataset.
  • ...and 7 more figures