ThinkSwitcher: When to Think Hard, When to Think Fast
Guosheng Liang, Longguang Zhong, Ziyi Yang, Xiaojun Quan
TL;DR
LRMs often over-elaborate on simple tasks, incurring high compute costs. ThinkSwitcher introduces a lightweight switcher that dynamically selects between short and long CoT modes based on task complexity, using self-supervised signals from the model’s own performance and prompt-induced short CoT to elicit concise reasoning. The approach achieves 20–30% token reductions with minimal accuracy loss across diverse benchmarks and model scales, outperforming static prompting and simple baselines while maintaining a single-model deployment. This work demonstrates a scalable, adaptive reasoning framework suitable for high-throughput settings and unified deployment of LRMs.
Abstract
Large reasoning models (LRMs) excel at solving complex tasks by leveraging long chain-of-thought (CoT) reasoning. However, this often leads to overthinking on simple tasks, resulting in unnecessary computational overhead. We observe that LRMs inherently possess the capability for efficient short CoT reasoning, which can be reliably elicited through prompt design. To leverage this capability, we propose ThinkSwitcher, a framework that enables a single LRM to dynamically switch between short and long CoT modes based on task complexity. ThinkSwitcher introduces a lightweight switching module trained with supervision signals derived from the relative performance of each reasoning mode across tasks. Experiments on multiple reasoning benchmarks show that ThinkSwitcher reduces computational cost by 20-30% while maintaining high accuracy on complex tasks. This demonstrates the effectiveness of ThinkSwitcher as a scalable and efficient solution for unified LRM deployment.
