Table of Contents
Fetching ...

CARROT: A Cost Aware Rate Optimal Router

Seamus Somerstep, Felipe Maia Polo, Allysson Flavio Melo de Oliveira, Prattyush Mangal, Mírian Silva, Onkar Bhardwaj, Mikhail Yurochkin, Subha Maity

TL;DR

This work tackles the challenge of cost-aware routing among many LLMs by formulating a minimax framework and proposing CARROT, a plug-in router that selects models using estimated per-query cost and accuracy. It proves minimax rate-optimality for a simple two-stage estimator and introduces SPROUT to benchmark cost-sensitive routing across diverse tasks and models. The authors provide empirical evidence showing CARROT can achieve similar or better performance at a fraction of the cost on SPROUT and competitive datasets, outperforming certain baselines and a handful of top models in specific settings. Overall, the paper offers a principled, scalable approach to cost-efficient LLM routing and a valuable dataset for advancing predictive routing research.

Abstract

With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. We conduct a minimax analysis of the routing problem, providing a lower bound and finding that a simple router that predicts both cost and accuracy for each question can be minimax optimal. Inspired by this, we introduce CARROT, a Cost AwaRe Rate Optimal rouTer that selects a model based on estimates of the models' cost and performance. Alongside CARROT, we also introduce the Smart Price-aware ROUTing (SPROUT) dataset to facilitate routing on a wide spectrum of queries with the latest state-of-the-art LLMs. Using SPROUT and prior benchmarks such as Routerbench and open-LLM-leaderboard-v2 we empirically validate CARROT's performance against several alternative routers.

CARROT: A Cost Aware Rate Optimal Router

TL;DR

This work tackles the challenge of cost-aware routing among many LLMs by formulating a minimax framework and proposing CARROT, a plug-in router that selects models using estimated per-query cost and accuracy. It proves minimax rate-optimality for a simple two-stage estimator and introduces SPROUT to benchmark cost-sensitive routing across diverse tasks and models. The authors provide empirical evidence showing CARROT can achieve similar or better performance at a fraction of the cost on SPROUT and competitive datasets, outperforming certain baselines and a handful of top models in specific settings. Overall, the paper offers a principled, scalable approach to cost-efficient LLM routing and a valuable dataset for advancing predictive routing research.

Abstract

With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. We conduct a minimax analysis of the routing problem, providing a lower bound and finding that a simple router that predicts both cost and accuracy for each question can be minimax optimal. Inspired by this, we introduce CARROT, a Cost AwaRe Rate Optimal rouTer that selects a model based on estimates of the models' cost and performance. Alongside CARROT, we also introduce the Smart Price-aware ROUTing (SPROUT) dataset to facilitate routing on a wide spectrum of queries with the latest state-of-the-art LLMs. Using SPROUT and prior benchmarks such as Routerbench and open-LLM-leaderboard-v2 we empirically validate CARROT's performance against several alternative routers.

Paper Structure

This paper contains 41 sections, 13 theorems, 64 equations, 6 figures, 2 tables.

Key Result

Theorem 1.1

An LLM router that predicts both cost and accuracy for every question and all models in a family can achieve optimal statistical efficiency.

Figures (6)

  • Figure 1: Percent of GPT-4o performance achieved by CARROT across datasets at various discounted costs, where the blue dotted line indicates similar ($100\%$) performance to GPT-4o.
  • Figure 2: Performance of several routers and individual LLMs on test data-split in Routerbench.
  • Figure 3: CARROT routing analysis on the SPROUT and Open-LLM-Leaderboard-v2 dataset.
  • Figure 4: Routerbench models and benchmarks (hu2024routerbench Table 1).
  • Figure 5: Router Bench Supplementary.
  • ...and 1 more figures

Theorems & Definitions (28)

  • Theorem 1.1: Theorems \ref{['thm:lower-bound']} and \ref{['thm:upper-bound']} informal
  • Lemma 2.1
  • Example 3.1: Binary classification with $0/1$-loss
  • Definition 3.2: Hölder smoothness
  • Theorem 3.6
  • Remark 3.7
  • Theorem 3.9: Upper bound
  • Remark 3.10: Rate efficient routers
  • Lemma C.1
  • Definition C.2: Kernel regularity
  • ...and 18 more