RouterRetriever: Routing over a Mixture of Expert Embedding Models

Hyunji Lee; Luca Soldaini; Arman Cohan; Minjoon Seo; Kyle Lo

RouterRetriever: Routing over a Mixture of Expert Embedding Models

Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo

TL;DR

RouterRetriever addresses the challenge of retrieving across diverse domains by routing queries to a mixture of domain-specific embedding experts implemented as LoRA adapters on a fixed base encoder. A pilot embedding library guides the routing by comparing query embeddings to centroids associated with each expert, enabling scalable addition or removal of experts without retraining. Empirical results on the BEIR benchmark show consistent gains over MSMARCO-only and multi-task baselines, with strong generalization to unseen domains; the approach also outperforms common routing baselines from language modeling, underscoring the importance of retrieval-specific routing design. The work demonstrates the practical impact of domain-specialized experts for diverse retrieval tasks and outlines avenues for more powerful routing methods and efficiency improvements.

Abstract

Information retrieval methods often rely on a single embedding model trained on large, general-domain datasets like MSMARCO. While this approach can produce a retriever with reasonable overall performance, they often underperform models trained on domain-specific data when testing on their respective domains. Prior work in information retrieval has tackled this through multi-task training, but the idea of routing over a mixture of domain-specific expert retrievers remains unexplored despite the popularity of such ideas in language model generation research. In this work, we introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism to select the most appropriate expert for each query. RouterRetriever is lightweight and allows easy addition or removal of experts without additional training. Evaluation on the BEIR benchmark demonstrates that RouterRetriever outperforms both models trained on MSMARCO (+2.1 absolute nDCG@10) and multi-task models (+3.2). This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average) commonly used in language modeling. Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models as an alternative to a single, general-purpose embedding model, especially when retrieving from diverse, specialized domains.

RouterRetriever: Routing over a Mixture of Expert Embedding Models

TL;DR

Abstract

Paper Structure (45 sections, 8 figures, 19 tables, 1 algorithm)

This paper contains 45 sections, 8 figures, 19 tables, 1 algorithm.

Introduction
Related Works
Domain Specific Retriever
Routing Techniques
Router Retriever
Experts
Pilot Embedding Library
Routing Mechanism
Experimental Setup
Baselines
Dataset
Hyperparameters
Results
Overall Performance
Comparing Different Routing Techniques
...and 30 more sections

Figures (8)

Figure 1: RouterRetriever: ① Given a query, we first extract its embedding using a base encoder. We then calculate an average similarity between the query embedding (black dot) and the pilot embeddings for each expert (orange dots for Expert A, red dots for Expert B, and blue dots for Expert C). The expert with the highest average similarity (Expert A in this case) is selected. ② The final query embedding is then produced by passing the query to Expert Encoder A, which consists of the base encoder combined with the selected expert LoRA module.
Figure 2: TSNE visualization of contriever embeddings for queries (left) and contexts (right) when sampled 100 instances from each dataset. We see high dispersion "general-domain" datasets like ArguAna and MSMARCO (blue) while "domain-specific" datasets like HotPotQA (green), NFCorpus (grey), SciFact (pink), and FiQA (purple) are tightly clustered. Datasets like Quora (yellow) have disperse queries but compact contexts.
Figure 3: Single expert performance (nDCG@10; y-axis) against number of training instances (x-axis). Each line color represents the training dataset used, and each plot is a BEIR test dataset. As we increase training set size, in-domain performance increases rapidly, but may not transfer to improved out-of-domain performance.
Figure 4: Average nDCG@10 (y-axis) by the number of experts (x-axis) for various models. RouterRetriever tends to show improved performance as the number of experts increases, outperforming a single MSMARCO-trained model even with just three experts despite less training data.
Figure 5: Average instance-level oracle routing performance nDCG@10 (y-axis) by the number of available experts (x-axis). The improvement rate tends to be high when adding experts initially followed by diminishing returns.
...and 3 more figures

RouterRetriever: Routing over a Mixture of Expert Embedding Models

TL;DR

Abstract

RouterRetriever: Routing over a Mixture of Expert Embedding Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)