Table of Contents
Fetching ...

Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models

Yu Shang, Yu Li, Fengli Xu, Yong Li

TL;DR

This paper introduces SoT, a dual-process inspired framework that combines cheap System 1-style intuition from multiple small LLMs with selective System 2-style intervention from a larger LLM. A confidence evaluator performs cross-evaluation of intuitive thoughts and triggers System 2 only when necessary, governed by a tunable threshold that rises with reasoning steps. The approach is model-agnostic and training-free, achieving state-of-the-art reasoning accuracy on six tasks while significantly reducing API costs (up to 75.1%), with open-ended tasks seeing the largest gains in token-cost reductions. The work demonstrates that adaptive hybrid LLM synergy can deliver both high-quality reasoning and substantial practical cost savings, alongside enhanced solution diversity.

Abstract

Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. Previous works like chain-of-thought (CoT) and tree-of-thoughts (ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing API cost, which could be particularly problematic for open-ended real-world tasks with huge solution spaces. Motivated by the dual process theory of human cognition, we propose "Synergy of Thoughts"(SoT) to unleash the synergistic potential of hybrid LLMs with different scales for efficient reasoning. By default, SoT uses smaller-scale language models to generate multiple low-cost intuitive thoughts, which resembles the parallel intuitions produced by System 1. We then design a confidence evaluator where the intuitive thoughts are cross-evaluated and introduce a controllable threshold mechanism to decide their mutual conflict. If these intuitive thoughts exhibit conflicts, SoT will invoke the reflective reasoning of scaled-up language models to emulate the intervention of System 2, which will override the intuitive thoughts and rectify the reasoning results. This framework is model-agnostic and training-free, which can be flexibly implemented with various off-the-shelf LLMs. Experiments on six representative reasoning tasks show that SoT substantially reduces the API cost by 38.3%-75.1%, and simultaneously achieves state-of-the-art reasoning accuracy and solution diversity. Notably, the average token cost reduction on open-ended tasks reaches up to 69.1%.

Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models

TL;DR

This paper introduces SoT, a dual-process inspired framework that combines cheap System 1-style intuition from multiple small LLMs with selective System 2-style intervention from a larger LLM. A confidence evaluator performs cross-evaluation of intuitive thoughts and triggers System 2 only when necessary, governed by a tunable threshold that rises with reasoning steps. The approach is model-agnostic and training-free, achieving state-of-the-art reasoning accuracy on six tasks while significantly reducing API costs (up to 75.1%), with open-ended tasks seeing the largest gains in token-cost reductions. The work demonstrates that adaptive hybrid LLM synergy can deliver both high-quality reasoning and substantial practical cost savings, alongside enhanced solution diversity.

Abstract

Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. Previous works like chain-of-thought (CoT) and tree-of-thoughts (ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing API cost, which could be particularly problematic for open-ended real-world tasks with huge solution spaces. Motivated by the dual process theory of human cognition, we propose "Synergy of Thoughts"(SoT) to unleash the synergistic potential of hybrid LLMs with different scales for efficient reasoning. By default, SoT uses smaller-scale language models to generate multiple low-cost intuitive thoughts, which resembles the parallel intuitions produced by System 1. We then design a confidence evaluator where the intuitive thoughts are cross-evaluated and introduce a controllable threshold mechanism to decide their mutual conflict. If these intuitive thoughts exhibit conflicts, SoT will invoke the reflective reasoning of scaled-up language models to emulate the intervention of System 2, which will override the intuitive thoughts and rectify the reasoning results. This framework is model-agnostic and training-free, which can be flexibly implemented with various off-the-shelf LLMs. Experiments on six representative reasoning tasks show that SoT substantially reduces the API cost by 38.3%-75.1%, and simultaneously achieves state-of-the-art reasoning accuracy and solution diversity. Notably, the average token cost reduction on open-ended tasks reaches up to 69.1%.
Paper Structure (26 sections, 11 equations, 16 figures, 12 tables, 2 algorithms)

This paper contains 26 sections, 11 equations, 16 figures, 12 tables, 2 algorithms.

Figures (16)

  • Figure 1: An illustration of dual process theory (a) and the main differences between SoT (b) and prior works (c) (d) (e). SoT is designed following the synergy paradigm of dual processes in human reasoning.
  • Figure 2: Overview of SoT illustrated with a two-step reasoning problem from the Open-ended QA task (making an outline in the first step and giving the answer in the second step). SoT prioritizes reasoning with default intuitions (System 1). When multiple intuitions are evaluated to be conflictual and low-confidence, SoT will intervene with reflective reasoning (invoking System 2) to override them.
  • Figure 3: Reasoning cost-accuracy trade-off under different threshold value choices in $SoT_O$ on (a) Game of 24 and (b) Trivia creative writing. The number in the figure means the chosen threshold value.
  • Figure 4: The reasoning accuracy, solution diversity versus token costs/TFLOPS on Game of 24 (a) (b) and Trivia Creative Writing (c) (d). SoT achieves a better performance-cost trade-off than all compared methods.
  • Figure 5: The reasoning accuracy versus token costs/TFLOPS on Logic Grid Puzzle task. SoT achieves a better performance-cost trade-off than all compared methods.
  • ...and 11 more figures