Table of Contents
Fetching ...

A Unified Approach to Routing and Cascading for LLMs

Jasper Dekoninck, Maximilian Baader, Martin Vechev

TL;DR

The paper develops a unified, theory-backed framework for selecting among multiple large language models by unifying routing and cascading into cascade routing. It derives optimal strategies for routing and for cascading as linear optimization problems, and proves the optimality of cascade routing under a cost budget. A key insight is that high-quality estimation—ex-ante for routing and post-hoc for cascading—drives performance; cascade routing leverages both to outperform baselines on benchmarks like RouterBench and SWE-Bench. Empirically, cascade routing yields consistent improvements across model pools and benchmarks, demonstrating the value of integrating routing and cascading and the importance of accurate quality estimation for practical gains.

Abstract

The availability of a wide range of large language models (LLMs) embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective to improve the cost-performance tradeoff, and (3) are unable to combine both paradigms for further improvements. To address these issues, we first derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy. Further, we propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy. Through our analysis, we identify good quality estimators as the critical factor for the success of model selection paradigms. Finally, in our experiments, we show that cascade routing consistently outperforms the individual approaches by a large margin and we analyze quality estimators to determine when routing and/or cascading are useful paradigms for model selection.

A Unified Approach to Routing and Cascading for LLMs

TL;DR

The paper develops a unified, theory-backed framework for selecting among multiple large language models by unifying routing and cascading into cascade routing. It derives optimal strategies for routing and for cascading as linear optimization problems, and proves the optimality of cascade routing under a cost budget. A key insight is that high-quality estimation—ex-ante for routing and post-hoc for cascading—drives performance; cascade routing leverages both to outperform baselines on benchmarks like RouterBench and SWE-Bench. Empirically, cascade routing yields consistent improvements across model pools and benchmarks, demonstrating the value of integrating routing and cascading and the importance of accurate quality estimation for practical gains.

Abstract

The availability of a wide range of large language models (LLMs) embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective to improve the cost-performance tradeoff, and (3) are unable to combine both paradigms for further improvements. To address these issues, we first derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy. Further, we propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy. Through our analysis, we identify good quality estimators as the critical factor for the success of model selection paradigms. Finally, in our experiments, we show that cascade routing consistently outperforms the individual approaches by a large margin and we analyze quality estimators to determine when routing and/or cascading are useful paradigms for model selection.

Paper Structure

This paper contains 75 sections, 12 theorems, 16 equations, 4 figures, 11 tables, 3 algorithms.

Key Result

Theorem 1

For a cost budget $B$, there exists a $\lambda \in \mathbb{R}^+$ and a $\gamma \in [0, 1]$ such that the optimal routing strategy $s_\textsc{opt}$ equals $\gamma s_\textsc{min}^\lambda + (1-\gamma) s_\textsc{max}^\lambda$. thm:routing:main, continued. Furthermore, all routing strategies that have an

Figures (4)

  • Figure 1: Overview of three model selection strategies. Routing selects a single model for a query, cascading processes queries through a sequence of models, and cascade routing generalizes both.
  • Figure 2: Difference in AUC performance between cascade routing and baseline strategies on RouterBench for various noise values. Red indicates cascade routing is much better, while blue indicates it is only a bit better.
  • Figure 3: Runtime of cascade routing variants for different numbers of models.
  • Figure 4: Quality-cost tradeoff curves for several benchmarks.

Theorems & Definitions (26)

  • Definition 1: Routing
  • Definition 2: Optimal Routing
  • Theorem 1: Optimal Routing Strategy
  • Definition 3: Supermodel
  • Definition 4: Cascading Strategy
  • Theorem 2: Optimal Cascading Strategy
  • Corollary 1: Optimal Threshold Strategy
  • Definition 5: Cascade Routing
  • Theorem 3: Optimal Cascade Routing
  • Lemma 1: Negative Marginal Gain
  • ...and 16 more