Table of Contents
Fetching ...

ICL-Router: In-Context Learned Model Representations for LLM Routing

Chenxu Wang, Hao Li, Yiqun Zhang, Linyao Chen, Jianhao Chen, Ping Jian, Peng Ye, Qiaosheng Zhang, Shuyue Hu

TL;DR

This work addresses the challenge of efficiently routing queries among multiple LLMs by proposing ICL-Router, a two-stage framework that uses in-context vectors to semantically profile model capabilities. A projector aligns query embeddings with a router's space, while autoregressive query reconstruction ensures meaningful representations; models are profiled via in-context performance on challenging queries, and a router learns to predict per-model success for new queries. The approach enables seamless addition of new LLMs without retraining and demonstrates state-of-the-art routing accuracy across both in-distribution and out-of-distribution benchmarks, with robust scalability as the model pool grows. Practically, ICL-Router offers a scalable, efficient routing solution that leverages compact vector representations to capture nuanced model strengths and supports dynamic model ecosystems in real-world deployments.

Abstract

Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router's semantic space. Second, each candidate model is profiled on a query set, and the router learns -- based on in-context vectors of query and model performance -- to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router. The code is available at https://github.com/lalalamdbf/ICL-Router.

ICL-Router: In-Context Learned Model Representations for LLM Routing

TL;DR

This work addresses the challenge of efficiently routing queries among multiple LLMs by proposing ICL-Router, a two-stage framework that uses in-context vectors to semantically profile model capabilities. A projector aligns query embeddings with a router's space, while autoregressive query reconstruction ensures meaningful representations; models are profiled via in-context performance on challenging queries, and a router learns to predict per-model success for new queries. The approach enables seamless addition of new LLMs without retraining and demonstrates state-of-the-art routing accuracy across both in-distribution and out-of-distribution benchmarks, with robust scalability as the model pool grows. Practically, ICL-Router offers a scalable, efficient routing solution that leverages compact vector representations to capture nuanced model strengths and supports dynamic model ecosystems in real-world deployments.

Abstract

Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router's semantic space. Second, each candidate model is profiled on a query set, and the router learns -- based on in-context vectors of query and model performance -- to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router. The code is available at https://github.com/lalalamdbf/ICL-Router.

Paper Structure

This paper contains 21 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The two-stage ICL-Router framework. (1) Query Reconstruction Training: The projector is trained to align the embedding model and router dimensions, while the router reconstructs queries from projected vectors to learn their semantics. (2) ICL Model Routing Training: Each model’s capabilities are encoded as in-context vectors, and the router is trained to predict whether a given model can handle a specific query.
  • Figure 2: Effects of integrating new LLMs on in-distribution routing performance.
  • Figure 3: Effects of integrating new LLMs on out-of-distribution (OOD) routing performance.
  • Figure 4: Effects of in-Context exemplar quantity on in-distribution performance.
  • Figure 5: Effects of in-context exemplar quantity on OOD performance.