Table of Contents
Fetching ...

Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning

Yang He, Xiao Ding, Bibo Cai, Yufei Zhang, Kai Xiong, Zhouhao Sun, Bing Qin, Ting Liu

TL;DR

Self-Route addresses the inefficiency of reasoning LLMs by automatically choosing between Short CoT and Long CoT based on capability estimation derived from a pre-inference stage. It introduces Gradient-10K, a densely sampled difficulty-gradient dataset used to train a router that detects model capability boundaries. Across diverse benchmarks and model scales, Self-Route achieves token reductions of 30–55% with less than 2% accuracy loss, and it remains effective in hybrid reasoning setups. The framework offers a practical, generalizable approach for efficient reasoning in real-world deployments by leveraging internal signals to adapt reasoning depth to problem difficulty.

Abstract

While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.

Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning

TL;DR

Self-Route addresses the inefficiency of reasoning LLMs by automatically choosing between Short CoT and Long CoT based on capability estimation derived from a pre-inference stage. It introduces Gradient-10K, a densely sampled difficulty-gradient dataset used to train a router that detects model capability boundaries. Across diverse benchmarks and model scales, Self-Route achieves token reductions of 30–55% with less than 2% accuracy loss, and it remains effective in hybrid reasoning setups. The framework offers a practical, generalizable approach for efficient reasoning in real-world deployments by leveraging internal signals to adapt reasoning depth to problem difficulty.

Abstract

While reasoning-augmented large language models (RLLMs) significantly enhance complex task performance through extended reasoning chains, they inevitably introduce substantial unnecessary token consumption, particularly for simpler problems where Short Chain-of-Thought (Short CoT) suffices. This overthinking phenomenon leads to inefficient resource usage without proportional accuracy gains. To address this issue, we propose Self-Route, a dynamic reasoning framework that automatically selects between general and reasoning modes based on model capability estimation. Our approach introduces a lightweight pre-inference stage to extract capability-aware embeddings from hidden layer representations, enabling real-time evaluation of the model's ability to solve problems. We further construct Gradient-10K, a model difficulty estimation-based dataset with dense complexity sampling, to train the router for precise capability boundary detection. Extensive experiments demonstrate that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55\% across diverse benchmarks. The proposed framework demonstrates consistent effectiveness across models with different parameter scales and reasoning paradigms, highlighting its general applicability and practical value.

Paper Structure

This paper contains 21 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of the problem scenario addressed by our method, highlighting the trade-offs between Long CoT Reasoning, Short CoT, and the Self-Route approach in balancing efficiency and accuracy.
  • Figure 2: Workflow from Gradient-10K dataset construction to Self-Route Inference. During capability boundary detection, the pre-inference module collects the hidden layer vector of the last token as the capability representation. The dense difficulty gradient of Gradient-10K ensures accurate reasoning mode selection, balancing correctness and efficiency in Self-Route Inference.
  • Figure 3: Routing accuracy of pre-inference vector representations from different hidden layers for Qwen2.5-7B and Qwen2.5-32B.
  • Figure 4: Radar chart comparing the accuracy of Qwen2.5-7B, R1-Distill-Qwen-7B, and Self-Route across multiple datasets. Self-Route maintains high accuracy while significantly reducing computational cost.
  • Figure 5: Model capability estimation across models at different difficulty levels of the dataset
  • ...and 1 more figures