DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

Jiabao Pan; Yan Zhang; Chen Zhang; Zuozhu Liu; Hongwei Wang; Haizhou Li

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li

TL;DR

This work tackles the efficiency–accuracy tension in large language model reasoning by introducing DynaThink, a dynamic decision-making framework that routes problems to Fast or Slow inference paths. The routing rests on two criteria: consistency verification (requiring $> \tfrac{1}{2}$ of votes) and reasoning-complexity verification (favoring fewer steps), enabling high-confidence, low-cost solutions when possible and more thorough multi-path reasoning when needed. Across six reasoning benchmarks and multiple LLMs, DynaThink improves both accuracy and efficiency relative to a Self-Consistency baseline, with notable gains in zero-shot and few-shot settings and compatible gains when combined with SelfCheck. The approach offers practical value for resource-constrained deployments and highlights a path toward more adaptive, cost-aware LLM reasoning.

Abstract

Large language models (LLMs) have demonstrated emergent capabilities across diverse reasoning tasks via popular Chains-of-Thought (COT) prompting. However, such a simple and fast COT approach often encounters limitations in dealing with complicated problems, while a thorough method, which considers multiple reasoning pathways and verifies each step carefully, results in slower inference. This paper addresses the challenge of enabling LLMs to autonomously select between fast and slow inference methods, thereby optimizing both efficiency and effectiveness. We introduce a dynamic decision-making framework that categorizes tasks into two distinct pathways: 'Fast', designated for tasks where the LLM quickly identifies a high-confidence solution, and 'Slow', allocated for tasks that the LLM perceives as complex and for which it has low confidence in immediate solutions as well as requiring more reasoning paths to verify. Experiments on five popular reasoning benchmarks demonstrated the superiority of the DynaThink over baselines.

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

TL;DR

of votes) and reasoning-complexity verification (favoring fewer steps), enabling high-confidence, low-cost solutions when possible and more thorough multi-path reasoning when needed. Across six reasoning benchmarks and multiple LLMs, DynaThink improves both accuracy and efficiency relative to a Self-Consistency baseline, with notable gains in zero-shot and few-shot settings and compatible gains when combined with SelfCheck. The approach offers practical value for resource-constrained deployments and highlights a path toward more adaptive, cost-aware LLM reasoning.

Abstract

Paper Structure (17 sections, 7 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 7 figures, 2 tables, 1 algorithm.

Introduction
DynaThink
Experiments
Setup
Main Results
Ablation Study of Reasoning Complexity Verification
Related Work
Conclusion
Limitations
Algorithm
Prompt Template
Ablation Study of Consistency Verification
Ablation Study of Verification Order
Number of test data
The rest data of ablation study of reasoning complexity verification
...and 2 more sections

Figures (7)

Figure 1: The workflow starts by incorporating the widely used CoT prompt, like 'let's think step by step' and then initially querying LLMs four times for each question (which can be fewer than four to begin with). In the first step, we choose question 1 and question 2 because they have one answer with more than half the votes. However, question 3 has a tie, with two answers getting equal votes, so we consider it a slow-thinking question. Next, we look at questions 1 and 2 to see how many steps are needed for each answer. For question 2, the answer with the most votes takes four steps, while the other answer needs three steps, so question 2 is also categorized as a slow-thinking question. Regarding question 1, we classify it as a fast-thinking question because the answer with the most votes also requires the fewest steps, allowing us to output this answer directly. However, for slow-thinking questions, we require additional iterations to make a selection.
Figure 2: SC: the original self-consistency approach that uses majority voting to identify the most agreed-upon answer wang2022self. DynaThink+SC: divides the question sets into fast and slow-thinking categories and applies different selection criteria to the answers from each set. The CoT prompting technique is utilized by both SC and DynaThink+SC, and this is applied in both zero-shot and few-shot settings wei2022chainKojima2022LargeLM. To ensure a fair comparison, DynaThink+SC also utilizes the SC strategy for the slow-thinking question set, to determine the final answer. However, there is a slight adjustment in the number of queries for LLM to accommodate the different processing requirements of the fast and slow-thinking question sets. It's important to highlight that we always ensure that the operational cost of using DynaThink+SC is either lower or competitive with the use of the SC strategy. In essence, the goal of DynaThink+SC is to optimize efficiency and cost-effectiveness. SQA means StrategyQA. Due to space limitations, more results are presented in appendix \ref{['sec:The rest results of main result']}.
Figure 3: Correlation between the distribution of reasoning steps and reasoning performance. We employed the self-consistency strategy within the zero-shot and few-shot CoT prompting techniques, to query the GPT-3.5-Turbo model two, five, and ten times for each question.Due to space limitations, more results are presented in appendix \ref{['sec:rest data of ablation study of reasoning complexity verification']}
Figure 4: Ablation Study of Consistency Verification. We employed the zero-shot and few-shot COT prompting techniques, to query the GPT-3.5-Turbo five, seven and ten times for each question. Three thresholds for consistency verification in DynaThink are considered, i.e., majority voting, more than half and all the same.
Figure 5: Ablation Study of Verification Order. SC + Step (DynaThink): initially deployed with consistency verification, followed by reasoning complexity verification. Step + SC: initially deployed with reasoning complexity verification, followed by consistency verification. The zero-shot COT prompting technique is utilized to query the GPT-3.5-Turbo six times.
...and 2 more figures

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

TL;DR

Abstract

DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)