Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning
Yang He, Xiao Ding, Bibo Cai, Yufei Zhang, Kai Xiong, Zhouhao Sun, Bing Qin, Ting Liu
TL;DR
Self-Route addresses the inefficiency of reasoning LLMs by automatically choosing between Short CoT and Long CoT based on capability estimation derived from a pre-inference stage. It introduces Gradient-10K, a densely sampled difficulty-gradient dataset used to train a router that detects model capability boundaries. Across diverse benchmarks and model scales, Self-Route achieves token reductions of 30–55% with less than 2% accuracy loss, and it remains effective in hybrid reasoning setups. The framework offers a practical, generalizable approach for efficient reasoning in real-world deployments by leveraging internal signals to adapt reasoning depth to problem difficulty.
Abstract
While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.
