Table of Contents
Fetching ...

RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers

Yifan Lu, Rixin Liu, Jiayi Yuan, Xingqi Cui, Shenrun Zhang, Hongyi Liu, Jiarong Xing

TL;DR

RouterArena introduces an open platform for comprehensive comparison of LLM routers, addressing the lack of standardized evaluation amid rapid router growth. It combines a Dewey Decimal Classification–based domain-diverse dataset, Bloom’s taxonomy–guided cognitive levels, empirically defined difficulty, a multi-metric leaderboard, and an automated framework for live updates. Empirical results reveal persistent accuracy–cost trade-offs across both commercial and open-source routers, with no router dominating all metrics and notable inefficiencies in current routing strategies. The framework and initial leaderboard aim to enable transparent, reproducible progress and guide the design of more cost-efficient, robust routing policies. Overall, RouterArena provides a practical, extensible basis for evaluating evolving router ecosystems and benchmarking future improvements.

Abstract

Today's LLM ecosystem comprises a wide spectrum of models that differ in size, capability, and cost. No single model is optimal for all scenarios; hence, LLM routers have become essential for selecting the most appropriate model under varying circumstances. However, the rapid emergence of various routers makes choosing the right one increasingly challenging. To address this problem, we need a comprehensive router comparison and a standardized leaderboard, similar to those available for models. In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. RouterArena has (1) a principally constructed dataset with broad knowledge domain coverage, (2) distinguishable difficulty levels for each domain, (3) an extensive list of evaluation metrics, and (4) an automated framework for leaderboard updates. Leveraging our framework, we have produced the initial leaderboard with detailed metrics comparison as shown in Figure 1. Our framework for evaluating new routers is on https://github.com/RouteWorks/RouterArena. Our leaderboard is on https://routeworks.github.io/.

RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers

TL;DR

RouterArena introduces an open platform for comprehensive comparison of LLM routers, addressing the lack of standardized evaluation amid rapid router growth. It combines a Dewey Decimal Classification–based domain-diverse dataset, Bloom’s taxonomy–guided cognitive levels, empirically defined difficulty, a multi-metric leaderboard, and an automated framework for live updates. Empirical results reveal persistent accuracy–cost trade-offs across both commercial and open-source routers, with no router dominating all metrics and notable inefficiencies in current routing strategies. The framework and initial leaderboard aim to enable transparent, reproducible progress and guide the design of more cost-efficient, robust routing policies. Overall, RouterArena provides a practical, extensible basis for evaluating evolving router ecosystems and benchmarking future improvements.

Abstract

Today's LLM ecosystem comprises a wide spectrum of models that differ in size, capability, and cost. No single model is optimal for all scenarios; hence, LLM routers have become essential for selecting the most appropriate model under varying circumstances. However, the rapid emergence of various routers makes choosing the right one increasingly challenging. To address this problem, we need a comprehensive router comparison and a standardized leaderboard, similar to those available for models. In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. RouterArena has (1) a principally constructed dataset with broad knowledge domain coverage, (2) distinguishable difficulty levels for each domain, (3) an extensive list of evaluation metrics, and (4) an automated framework for leaderboard updates. Leveraging our framework, we have produced the initial leaderboard with detailed metrics comparison as shown in Figure 1. Our framework for evaluating new routers is on https://github.com/RouteWorks/RouterArena. Our leaderboard is on https://routeworks.github.io/.

Paper Structure

This paper contains 52 sections, 2 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 2: Timeline of example router-related works and products.
  • Figure 3: Dataset composition. For ease of demonstration, we merged some categories.
  • Figure 4: Each query is answered by 42 LLMs and evaluated for accuracy. Sorting queries by empirical accuracy yields smooth, monotonic curves overall and within each domain, indicating that query difficulty is well spread across the full range from very hard (left) to very easy (right).
  • Figure 5: RouterArena Live Leaderboard.
  • Figure 6: Deferral Curve: accuracy versus cost
  • ...and 7 more figures