Optimizing Compound Retrieval Systems
Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang
TL;DR
This work tackles the question of how to form high-quality document rankings by combining multiple prediction models beyond traditional cascades. It introduces compound retrieval systems and an optimization framework that learns where to apply component models and how to aggregate their predictions, enabling interactions with large language models (LLMs) alongside traditional retrievers like BM25. The approach is instantiated with a three-model setup (BM25, a pointwise LLM predictor, and a pairwise LLM predictor) and optimized under supervised or self-supervised objectives to balance effectiveness and efficiency. Results show that optimized compound systems can surpass cascading baselines in effectiveness-efficiency trade-offs, achieving competitive ranking quality with orders of magnitude fewer LLM calls and revealing a range of novel, non-cascading strategies for model interaction.
Abstract
Modern retrieval systems do not rely on a single ranking model to construct their rankings. Instead, they generally take a cascading approach where a sequence of ranking models are applied in multiple re-ranking stages. Thereby, they balance the quality of the top-K ranking with computational costs by limiting the number of documents each model re-ranks. However, the cascading approach is not the only way models can interact to form a retrieval system. We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. We focus on the optimization of compound retrieval system design which uniquely involves learning where to apply the component models and how to aggregate their predictions into a final ranking. This work shows how our compound approach can combine the classic BM25 retrieval model with state-of-the-art (pairwise) LLM relevance predictions, while optimizing a given ranking metric and efficiency target. Our experimental results show optimized compound retrieval systems provide better trade-offs between effectiveness and efficiency than cascading approaches, even when applied in a self-supervised manner. With the introduction of compound retrieval systems, we hope to inspire the information retrieval field to more out-of-the-box thinking on how prediction models can interact to form rankings.
