RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models
Jiarui Zhang, Xiangyu Liu, Yong Hu, Chaoyue Niu, Fan Wu, Guihai Chen
TL;DR
RAGRouter addresses the problem of routing queries to multiple retrieval-augmented LLMs by explicitly modeling how retrieved documents shift each model's knowledge state. It combines document embeddings, cross-document interactions, and per-model RAG capability embeddings, trained with a contrastive objective and a binary accuracy classifier to capture retrieval-induced shifts and model heterogeneity. Empirical results across five knowledge-intensive tasks and both open- and closed-source LLMs show that RAGRouter consistently outperforms the best single LLM and non-RAG-aware baselines, while also enabling strong latency-aware performance under low-latency constraints. The work demonstrates the importance of accommodating dynamic interactions between external knowledge and model capabilities in routing decisions for RAG systems.
Abstract
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks. However, varying response quality across LLMs under RAG necessitates intelligent routing mechanisms, which select the most suitable model for each query from multiple retrieval-augmented LLMs via a dedicated router model. We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods, which rely on static parametric knowledge representations, exhibit suboptimal performance in RAG scenarios. To address this, we formally define the new retrieval-augmented LLM routing problem, incorporating the influence of retrieved documents into the routing framework. We propose RAGRouter, a RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts and enable informed routing decisions. Extensive experiments on diverse knowledge-intensive tasks and retrieval settings, covering open and closed-source LLMs, show that RAGRouter outperforms the best individual LLM and existing routing methods. With an extended score-threshold-based mechanism, it also achieves strong performance-efficiency trade-offs under low-latency constraints. The code and data are available at https://github.com/OwwO99/RAGRouter.
