Table of Contents
Fetching ...

ThinkSwitcher: When to Think Hard, When to Think Fast

Guosheng Liang, Longguang Zhong, Ziyi Yang, Xiaojun Quan

TL;DR

LRMs often over-elaborate on simple tasks, incurring high compute costs. ThinkSwitcher introduces a lightweight switcher that dynamically selects between short and long CoT modes based on task complexity, using self-supervised signals from the model’s own performance and prompt-induced short CoT to elicit concise reasoning. The approach achieves 20–30% token reductions with minimal accuracy loss across diverse benchmarks and model scales, outperforming static prompting and simple baselines while maintaining a single-model deployment. This work demonstrates a scalable, adaptive reasoning framework suitable for high-throughput settings and unified deployment of LRMs.

Abstract

Large reasoning models (LRMs) excel at solving complex tasks by leveraging long chain-of-thought (CoT) reasoning. However, this often leads to overthinking on simple tasks, resulting in unnecessary computational overhead. We observe that LRMs inherently possess the capability for efficient short CoT reasoning, which can be reliably elicited through prompt design. To leverage this capability, we propose ThinkSwitcher, a framework that enables a single LRM to dynamically switch between short and long CoT modes based on task complexity. ThinkSwitcher introduces a lightweight switching module trained with supervision signals derived from the relative performance of each reasoning mode across tasks. Experiments on multiple reasoning benchmarks show that ThinkSwitcher reduces computational cost by 20-30% while maintaining high accuracy on complex tasks. This demonstrates the effectiveness of ThinkSwitcher as a scalable and efficient solution for unified LRM deployment.

ThinkSwitcher: When to Think Hard, When to Think Fast

TL;DR

LRMs often over-elaborate on simple tasks, incurring high compute costs. ThinkSwitcher introduces a lightweight switcher that dynamically selects between short and long CoT modes based on task complexity, using self-supervised signals from the model’s own performance and prompt-induced short CoT to elicit concise reasoning. The approach achieves 20–30% token reductions with minimal accuracy loss across diverse benchmarks and model scales, outperforming static prompting and simple baselines while maintaining a single-model deployment. This work demonstrates a scalable, adaptive reasoning framework suitable for high-throughput settings and unified deployment of LRMs.

Abstract

Large reasoning models (LRMs) excel at solving complex tasks by leveraging long chain-of-thought (CoT) reasoning. However, this often leads to overthinking on simple tasks, resulting in unnecessary computational overhead. We observe that LRMs inherently possess the capability for efficient short CoT reasoning, which can be reliably elicited through prompt design. To leverage this capability, we propose ThinkSwitcher, a framework that enables a single LRM to dynamically switch between short and long CoT modes based on task complexity. ThinkSwitcher introduces a lightweight switching module trained with supervision signals derived from the relative performance of each reasoning mode across tasks. Experiments on multiple reasoning benchmarks show that ThinkSwitcher reduces computational cost by 20-30% while maintaining high accuracy on complex tasks. This demonstrates the effectiveness of ThinkSwitcher as a scalable and efficient solution for unified LRM deployment.

Paper Structure

This paper contains 38 sections, 12 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comparison of long and short CoTs generated using different prompting strategies with Deepseek-R1-Distill-Qwen-7B. While long CoT reasoning often leads to overthinking and excessive token consumption due to elaborate reasoning steps, the short CoT can deliver comparable accuracy with substantially fewer tokens.
  • Figure 2: Comparison of long CoT and induced short CoT on the MATH500. "R1" denotes DeepSeek-R1-Distill series. Left: Accuracy comparison between long CoT and short CoT. Right: Average token usage for each reasoning mode, which demonstrates substantial token reductions with short CoT. Our approach of inducing short CoT consistently achieves substantial token savings while maintaining competitive accuracy across diverse LRMs.
  • Figure 3: Dynamic mode selection during inference. Given a question embedding from the LRM, ThinkSwitcher dynamically chooses between short and long CoT reasoning based on estimated task difficulty.
  • Figure 4: Trade-off between average accuracy and cost (measured by average output tokens) for the three DeepSeek-R1-Distill-Qwen model sizes. Each point on the ThinkSwitcher curves corresponds to a different $\tau$ value.
  • Figure 5: Performance with different values of $k$ used to estimate pass rates in training data construction.
  • ...and 5 more figures