Table of Contents
Fetching ...

Guiding Retrieval using LLM-based Listwise Rankers

Mandeep Rathee, Sean MacAvaney, Avishek Anand

TL;DR

This work tackles the bounded recall problem in cascaded retrieval when using listwise LLM rerankers by proposing SlideGar, a sliding-window, graph-augmented adaptive retrieval method. SlideGar alternates between the initial retrieved pool and a corpus-graph frontier, using the LLM to rank a window and employing reciprocal rank as a pseudo-score to guide subsequent expansion, thereby mitigating the exclusion of relevant documents not initially retrieved. Across MSMARCO and MSMARCO-passage-v2, with diverse retrievers and rankers, SlideGar yields up to $13.23\%$ improvements in $nDCG@10$ and up to $28.02\%$ in $Recall@c$, while incurring only about $0.02\%$ additional latency relative to standard LLM reranking. This approach enables broader adoption of LLM-based reranking in settings with limited initial results or high first-stage costs, and the authors release their code for public use.

Abstract

Large Language Models (LLMs) have shown strong promise as rerankers, especially in ``listwise'' settings where an LLM is prompted to rerank several search results at once. However, this ``cascading'' retrieve-and-rerank approach is limited by the bounded recall problem: relevant documents not retrieved initially are permanently excluded from the final ranking. Adaptive retrieval techniques address this problem, but do not work with listwise rerankers because they assume a document's score is computed independently from other documents. In this paper, we propose an adaptation of an existing adaptive retrieval method that supports the listwise setting and helps guide the retrieval process itself (thereby overcoming the bounded recall problem for LLM rerankers). Specifically, our proposed algorithm merges results both from the initial ranking and feedback documents provided by the most relevant documents seen up to that point. Through extensive experiments across diverse LLM rerankers, first stage retrievers, and feedback sources, we demonstrate that our method can improve nDCG@10 by up to 13.23% and recall by 28.02%--all while keeping the total number of LLM inferences constant and overheads due to the adaptive process minimal. The work opens the door to leveraging LLM-based search in settings where the initial pool of results is limited, e.g., by legacy systems, or by the cost of deploying a semantic first-stage.

Guiding Retrieval using LLM-based Listwise Rankers

TL;DR

This work tackles the bounded recall problem in cascaded retrieval when using listwise LLM rerankers by proposing SlideGar, a sliding-window, graph-augmented adaptive retrieval method. SlideGar alternates between the initial retrieved pool and a corpus-graph frontier, using the LLM to rank a window and employing reciprocal rank as a pseudo-score to guide subsequent expansion, thereby mitigating the exclusion of relevant documents not initially retrieved. Across MSMARCO and MSMARCO-passage-v2, with diverse retrievers and rankers, SlideGar yields up to improvements in and up to in , while incurring only about additional latency relative to standard LLM reranking. This approach enables broader adoption of LLM-based reranking in settings with limited initial results or high first-stage costs, and the authors release their code for public use.

Abstract

Large Language Models (LLMs) have shown strong promise as rerankers, especially in ``listwise'' settings where an LLM is prompted to rerank several search results at once. However, this ``cascading'' retrieve-and-rerank approach is limited by the bounded recall problem: relevant documents not retrieved initially are permanently excluded from the final ranking. Adaptive retrieval techniques address this problem, but do not work with listwise rerankers because they assume a document's score is computed independently from other documents. In this paper, we propose an adaptation of an existing adaptive retrieval method that supports the listwise setting and helps guide the retrieval process itself (thereby overcoming the bounded recall problem for LLM rerankers). Specifically, our proposed algorithm merges results both from the initial ranking and feedback documents provided by the most relevant documents seen up to that point. Through extensive experiments across diverse LLM rerankers, first stage retrievers, and feedback sources, we demonstrate that our method can improve nDCG@10 by up to 13.23% and recall by 28.02%--all while keeping the total number of LLM inferences constant and overheads due to the adaptive process minimal. The work opens the door to leveraging LLM-based search in settings where the initial pool of results is limited, e.g., by legacy systems, or by the cost of deploying a semantic first-stage.
Paper Structure (18 sections, 3 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The SlideGar algorithm visualized. LLM Ranker ranks the list of documents (window) and then SlideGar leverages LLM Ranker feedback and looks for the neighbors of $d_4$ and $d_2$, and carries both documents and their neighbors in the next window. The neighborhood documents are highlighted in green. The remaining documents, $d_1$ and $d_3$, are added to $R_1$.
  • Figure 2: Effect of SlideGar on different ranking pipelines on TREC DL19 dataset when the number of graph neighbors $k$ varies and window size is 20 and step size 10.
  • Figure : SlideGar: Sliding Window based Graph Adaptive Retrieval