LLMRank: Understanding LLM Strengths for Model Routing

Shubham Agrawal; Prasang Gupta

LLMRank: Understanding LLM Strengths for Model Routing

Shubham Agrawal, Prasang Gupta

TL;DR

LLMRank addresses the model routing problem by combining explicit prompt features with a neural ranking model to predict per-model utility under a cost–quality trade-off, guided by a tunable parameter in the objective. It introduces a three-part framework: feature extraction, a cost-aware neural ranking model, and a cost-aware inference procedure, trained with a hybrid objective that blends pointwise utility regression and listwise ranking via KL divergence. On RouterBench, LLMRank achieves up to 89.2% of oracle utility, with Perf, Balanced, and Cost variants offering distinct trade-offs between quality and expense, while providing interpretable routing through feature attributions. The results show substantial efficiency gains over single-model baselines and prior routers, especially when open-source models are used, underscoring the value of semantic feature signals for scalable, transparent LLM deployment. This work points to a principled path for integrating new models and tasks in dynamic LLM ecosystems, with potential extensions to multimodal routing and session-level decision making.

Abstract

The rapid growth of large language models (LLMs) with diverse capabilities, latency and computational costs presents a critical deployment challenge: selecting the most suitable model for each prompt to optimize the trade-off between performance and efficiency. We introduce LLMRank, a prompt-aware routing framework that leverages rich, human-readable features extracted from prompts, including task type, reasoning patterns, complexity indicators, syntactic cues, and signals from a lightweight proxy solver. Unlike prior one-shot routers that rely solely on latent embeddings, LLMRank predicts per-model utility using a neural ranking model trained on RouterBench, comprising 36,497 prompts spanning 11 benchmarks and 11 state-of-the-art LLMs, from small efficient models to large frontier systems. Our approach achieves up to 89.2% of oracle utility, while providing interpretable feature attributions that explain routing decisions. Extensive studies demonstrate the importance of multifaceted feature extraction and the hybrid ranking objective, highlighting the potential of feature-driven routing for efficient and transparent LLM deployment.

LLMRank: Understanding LLM Strengths for Model Routing

TL;DR

Abstract

LLMRank: Understanding LLM Strengths for Model Routing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)