Table of Contents
Fetching ...

Optimizing Compound Retrieval Systems

Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang

TL;DR

This work tackles the question of how to form high-quality document rankings by combining multiple prediction models beyond traditional cascades. It introduces compound retrieval systems and an optimization framework that learns where to apply component models and how to aggregate their predictions, enabling interactions with large language models (LLMs) alongside traditional retrievers like BM25. The approach is instantiated with a three-model setup (BM25, a pointwise LLM predictor, and a pairwise LLM predictor) and optimized under supervised or self-supervised objectives to balance effectiveness and efficiency. Results show that optimized compound systems can surpass cascading baselines in effectiveness-efficiency trade-offs, achieving competitive ranking quality with orders of magnitude fewer LLM calls and revealing a range of novel, non-cascading strategies for model interaction.

Abstract

Modern retrieval systems do not rely on a single ranking model to construct their rankings. Instead, they generally take a cascading approach where a sequence of ranking models are applied in multiple re-ranking stages. Thereby, they balance the quality of the top-K ranking with computational costs by limiting the number of documents each model re-ranks. However, the cascading approach is not the only way models can interact to form a retrieval system. We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. We focus on the optimization of compound retrieval system design which uniquely involves learning where to apply the component models and how to aggregate their predictions into a final ranking. This work shows how our compound approach can combine the classic BM25 retrieval model with state-of-the-art (pairwise) LLM relevance predictions, while optimizing a given ranking metric and efficiency target. Our experimental results show optimized compound retrieval systems provide better trade-offs between effectiveness and efficiency than cascading approaches, even when applied in a self-supervised manner. With the introduction of compound retrieval systems, we hope to inspire the information retrieval field to more out-of-the-box thinking on how prediction models can interact to form rankings.

Optimizing Compound Retrieval Systems

TL;DR

This work tackles the question of how to form high-quality document rankings by combining multiple prediction models beyond traditional cascades. It introduces compound retrieval systems and an optimization framework that learns where to apply component models and how to aggregate their predictions, enabling interactions with large language models (LLMs) alongside traditional retrievers like BM25. The approach is instantiated with a three-model setup (BM25, a pointwise LLM predictor, and a pairwise LLM predictor) and optimized under supervised or self-supervised objectives to balance effectiveness and efficiency. Results show that optimized compound systems can surpass cascading baselines in effectiveness-efficiency trade-offs, achieving competitive ranking quality with orders of magnitude fewer LLM calls and revealing a range of novel, non-cascading strategies for model interaction.

Abstract

Modern retrieval systems do not rely on a single ranking model to construct their rankings. Instead, they generally take a cascading approach where a sequence of ranking models are applied in multiple re-ranking stages. Thereby, they balance the quality of the top-K ranking with computational costs by limiting the number of documents each model re-ranks. However, the cascading approach is not the only way models can interact to form a retrieval system. We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. We focus on the optimization of compound retrieval system design which uniquely involves learning where to apply the component models and how to aggregate their predictions into a final ranking. This work shows how our compound approach can combine the classic BM25 retrieval model with state-of-the-art (pairwise) LLM relevance predictions, while optimizing a given ranking metric and efficiency target. Our experimental results show optimized compound retrieval systems provide better trade-offs between effectiveness and efficiency than cascading approaches, even when applied in a self-supervised manner. With the introduction of compound retrieval systems, we hope to inspire the information retrieval field to more out-of-the-box thinking on how prediction models can interact to form rankings.

Paper Structure

This paper contains 28 sections, 34 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the compound retrieval system described in Section \ref{['sec:method']}. A first-stage retrieval model $M_0$ retrieves documents to create a first ranking $R_0$; based on their position in $R_0$, the policy $\pi$ decides which documents to apply the pointwise prediction model $M_1$ and pairwise prediction model $M_2$. Subsequently, the predictions of $M_1$ and $M_2$ are only gathered where activated and combined into a final ranking $R^*$ using the score aggregation function $f$.
  • Figure 2: Effectiveness-efficiency trade-off curves (averages over 50 runs), created by optimizing compound systems with various trade-off weight, and varying the top-$K$ to re-rank for cascade systems. Annotations indicate highest baseline effectiveness.
  • Figure 3: Examples of (deterministic) selection policies of our optimized compound retrieval systems. Top bars display the selection of pointwise predictions, square matrices that of pairwise predictions (black means selected). Pixels indices correspond to the first-stage ranking (e.g., the $i$th pixel in the top-bar indicates whether the pointwise prediction for $i$th document in the BM25 ranking was selected). Above each policy is the ranking loss used for optimization and its total number of selections $N$.