Eagle: Efficient Training-Free Router for Multi-LLM Inference

Zesen Zhao; Shuowei Jin; Z. Morley Mao

Eagle: Efficient Training-Free Router for Multi-LLM Inference

Zesen Zhao, Shuowei Jin, Z. Morley Mao

TL;DR

Eagle tackles scalable, budget-aware routing among multiple LLMs in online environments by fusing global general-ability and local specialized-ability through an ELO-based framework that converts sparse user feedback into full model rankings. It is training-free, leveraging a vector database to identify similar past queries and updating ratings incrementally without retraining. Empirical results on RouterBench show Eagle outperforming strong baselines in AUC and delivering substantial online-adaptation efficiency, with dramatically reduced initialization and update times. The approach enables real-time, low-overhead model selection that maintains high inference quality in dynamic, high-volume LLM serving contexts.

Abstract

The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection in AI systems. LLM routers address this need by dynamically choosing the most suitable model for a given query based on task requirements and budget constraints. However, existing routers face challenges in scalability and real-time adaptation, particularly in high-volume online environments. We present Eagle, a novel LLM routing approach that combines global and local ELO ranking modules to overcome these limitations. By evaluating both general and specialized LLM abilities, Eagle provides a scalable, training-free solution that enhances model selection quality while reducing computational overhead. Our experiments across multiple datasets show Eagle consistently outperforms baseline methods, with improvements of up to 23.52 percent in Area Under Curve (AUC) scores. Moreover, Eagle demonstrates remarkable efficiency, requiring only 1/20 of baseline methods' time for initialization and 100 to 200 times faster incremental updates in online scenarios, making it well-suited for dynamic, high-volume online serving environments.

Eagle: Efficient Training-Free Router for Multi-LLM Inference

TL;DR

Abstract

Paper Structure (12 sections, 2 equations, 4 figures)

This paper contains 12 sections, 2 equations, 4 figures.

Introduction
Eagle Architecture and Design
Eagle Design
Details of Eagle
Evaluation
Overall Performance
Online Adaptation Efficiency and Quality
Experimental Setup
Model Parameters
Baseline Configurations
Ablation Studies for Eagle
Related Works

Figures (4)

Figure 1: Eagle workflow: ① User request submission. ② Retrieval of relevant historical data. ③ LLM quality ranking and selection within budget. ④ Response generation and delivery. ⑤ Optional secondary model comparison and feedback collection.
Figure 2: Comparison of Baseline Models with Eagle.
Figure 3: Comparison of training time and quality
Figure 4: Ablation studies for Eagle components and parameter sensitivity.

Eagle: Efficient Training-Free Router for Multi-LLM Inference

TL;DR

Abstract

Eagle: Efficient Training-Free Router for Multi-LLM Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (4)