Table of Contents
Fetching ...

CP-Router: An Uncertainty-Aware Router Between LLM and LRM

Jiayuan Su, Fulin Lin, Zhaopeng Feng, Han Zheng, Teng Wang, Zhenyu Xiao, Xinlong Zhao, Zuozhu Liu, Lu Cheng, Hongwei Wang

TL;DR

CP-Router addresses the inefficiency of LRMs by routing prompts between an LLM and an LRM based on principled uncertainty estimates from Conformal Prediction (CP). It is training-free and model-agnostic, with Full and Binary Entropy (FBE) guiding adaptive CP threshold selection to separate easy and hard prompts. Across seven MCQA benchmarks and open-ended GSM8K, CP-Router reduces token usage while maintaining or improving accuracy, and it generalizes across diverse model pairings and QA formats. This work provides a practical, uncertainty-aware routing mechanism that enhances efficiency for collaborative AI systems without requiring additional model training.

Abstract

Recent advances in Large Reasoning Models (LRMs) have significantly improved long-chain reasoning capabilities over Large Language Models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To overcome this, we propose CP-Router, a training-free and model-agnostic routing framework that dynamically selects between an LLM and an LRM, demonstrated with multiple-choice question answering (MCQA) prompts. The routing decision is guided by the prediction uncertainty estimates derived via Conformal Prediction (CP), which provides rigorous coverage guarantees. To further refine the uncertainty differentiation across inputs, we introduce Full and Binary Entropy (FBE), a novel entropy-based criterion that adaptively selects the appropriate CP threshold. Experiments across diverse MCQA benchmarks, including mathematics, logical reasoning, and Chinese chemistry, demonstrate that CP-Router efficiently reduces token usage while maintaining or even improving accuracy compared to using LRM alone. We also extend CP-Router to diverse model pairings and open-ended QA, where it continues to demonstrate strong performance, validating its generality and robustness.

CP-Router: An Uncertainty-Aware Router Between LLM and LRM

TL;DR

CP-Router addresses the inefficiency of LRMs by routing prompts between an LLM and an LRM based on principled uncertainty estimates from Conformal Prediction (CP). It is training-free and model-agnostic, with Full and Binary Entropy (FBE) guiding adaptive CP threshold selection to separate easy and hard prompts. Across seven MCQA benchmarks and open-ended GSM8K, CP-Router reduces token usage while maintaining or improving accuracy, and it generalizes across diverse model pairings and QA formats. This work provides a practical, uncertainty-aware routing mechanism that enhances efficiency for collaborative AI systems without requiring additional model training.

Abstract

Recent advances in Large Reasoning Models (LRMs) have significantly improved long-chain reasoning capabilities over Large Language Models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To overcome this, we propose CP-Router, a training-free and model-agnostic routing framework that dynamically selects between an LLM and an LRM, demonstrated with multiple-choice question answering (MCQA) prompts. The routing decision is guided by the prediction uncertainty estimates derived via Conformal Prediction (CP), which provides rigorous coverage guarantees. To further refine the uncertainty differentiation across inputs, we introduce Full and Binary Entropy (FBE), a novel entropy-based criterion that adaptively selects the appropriate CP threshold. Experiments across diverse MCQA benchmarks, including mathematics, logical reasoning, and Chinese chemistry, demonstrate that CP-Router efficiently reduces token usage while maintaining or even improving accuracy compared to using LRM alone. We also extend CP-Router to diverse model pairings and open-ended QA, where it continues to demonstrate strong performance, validating its generality and robustness.

Paper Structure

This paper contains 26 sections, 1 theorem, 6 equations, 7 figures, 6 tables.

Key Result

Theorem 2.1

Suppose $(X_i, Y_i)_{i=1,...,n}$ and $(X_{\text{test}}, Y_{\text{test}})$ are independent and identically distributed (i.i.d.). $C: \mathcal{X} \rightarrow 2^{\mathcal{Y}}$ is a set-valued mapping satisfying the nesting property in Eq. nesting-pro. The following holds: where $\alpha \in (0, 1)$ is the user-defined error rate, and $\mathcal{C}(X_{\text{test}})$ is the prediction set for input $X_{

Figures (7)

  • Figure 1: Token consumption for the question "What is the natural number that comes after 1?" LLMs provide correct answers with concise token usage, whereas LRMs consume significantly more tokens, suggesting a potential "overthinking" issue.
  • Figure 2: Key Components of the CP-Router Framework. (a) CP-Based Routing. For each prompt, it applies CP with a target error rate $\alpha$ to generate a prediction set based on LLM output probabilities. Prompts with small prediction sets are routed to an LLM, while those with large sets are routed to an LRM. This enables uncertainty-aware, dynamic routing. (b) FBE-Based Adaptive Calibration. Instead of using a fixed error rate, CP-Router leverages FBE to automatically select the optimal $\alpha$ that maximizes uncertainty separability, enabling more effective differentiation between easy and hard prompts—crucial for adaptive routing decisions.
  • Figure 3: Different prediction set size distributions. A manually chosen error rate $\alpha$ might lead to a poor spread, as in (b), whereas (a) illustrates a desirable distribution.
  • Figure 4: (a) Accuracy comparison between Qwen2.5-14B (LLM) and DeepSeek-R1-Distill-Qwen-14B (LRM) on the GPQA and CN-Chemistry benchmarks. The LRM performs better on GPQA, while the LLM achieves higher accuracy on CN-Chemistry. (b) Average Prediction Set Size (APSS) of Qwen2.5-14B under an error rate of 0.2 on both datasets. GPQA exhibits a larger APSS, suggesting higher prediction uncertainty.
  • Figure 5: Accuracy and prompt allocation of CP-Router on the GPQA benchmark using DeepSeek-V3 and DeepSeek-R1. CP-Router improves overall accuracy while reducing token consumption by avoiding routing a portion of the prompts to the more expensive R1 model.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 2.1: Conformal Coverage Guarantee