Table of Contents
Fetching ...

Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

Bo Zhao, Berkcan Kapusuzoglu, Kartik Balasubramaniam, Sambit Sahu, Supriyo Chakraborty, Genta Indra Winata

TL;DR

The paper tackles the high computational cost of reasoning with large language models by proposing a routing mechanism that assigns each problem to the smallest model likely to solve it. It builds lightweight predictors of problem difficulty and model correctness from intermediate representations of a strong 32B model (s1.1-32B) to guide routing across a pool of models. Evaluations on diverse math benchmarks using the MathCombined dataset show that difficulty- and accuracy-based routing substantially reduces inference compute while maintaining or exceeding the large model's accuracy. These results demonstrate the practicality of difficulty-aware routing for cost-efficient deployment of reasoning systems, with middle-layer representations offering the most predictive signals for routing decisions.

Abstract

Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.

Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

TL;DR

The paper tackles the high computational cost of reasoning with large language models by proposing a routing mechanism that assigns each problem to the smallest model likely to solve it. It builds lightweight predictors of problem difficulty and model correctness from intermediate representations of a strong 32B model (s1.1-32B) to guide routing across a pool of models. Evaluations on diverse math benchmarks using the MathCombined dataset show that difficulty- and accuracy-based routing substantially reduces inference compute while maintaining or exceeding the large model's accuracy. These results demonstrate the practicality of difficulty-aware routing for cost-efficient deployment of reasoning systems, with middle-layer representations offering the most predictive signals for routing decisions.

Abstract

Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.

Paper Structure

This paper contains 14 sections, 10 figures.

Figures (10)

  • Figure 1: A classifier predicts problem difficulty from intermediate representations, and route each problem to the smallest reasoning model likely to solve it. This reduces inference cost while maintaining accuracy.
  • Figure 2: Prediction performance using outputs from different layers of s1.1-32B on (a) question difficulty level and (b) whether various language models can answer the given question correctly. Middle layers provide the most informative representations.
  • Figure 3: Performance of difficulty-based routing using s1.1-32B layer outputs. A problem is routed to a larger model if the predicted difficulty exceeds a threshold, and to a smaller model otherwise. Blue dots indicate router-based systems with thresholds between 2.1 and 2.9; orange dots show baseline models. Routers consistently outperform random assignment.
  • Figure 4: Performance of accuracy-based routing using s1.1-32B layer outputs. Each problem is routed to the weakest model with predicted correctness above a threshold. Blue dots correspond to thresholds between 0.05 and 0.9.
  • Figure 5: Difficulty level distribution of the MATH dataset.
  • ...and 5 more figures