Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
KV Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar
TL;DR
No single open-source LLM dominates across benchmarks, motivating a routing approach to assign each input to the most suitable model. The authors develop sparse LLM routing with classifier-based and clustering-based strategies and evaluate them on GSM8K and MMLU using a diverse pool of LLMs, defining oracle and classifier-based upper bounds to quantify potential gains. Results show that routing can outperform weak LLMs but generally cannot surpass the top-performing LLM due to limited training data, while incurring latency trade-offs. The work highlights the feasibility and limitations of LLM routing, and points to data, modeling, and policy improvements as avenues for future gains in efficient, highly accurate utilization of multiple LLMs.
Abstract
With the rapid development of LLMs, it is natural to ask how to harness their capabilities efficiently. In this paper, we explore whether it is feasible to direct each input query to a single most suitable LLM. To this end, we propose LLM routing for challenging reasoning tasks. Our extensive experiments suggest that such routing shows promise but is not feasible in all scenarios, so more robust approaches should be investigated to fill this gap.
