Top-Down Partitioning for Efficient List-Wise Ranking
Andrew Parry, Sean MacAvaney, Debasis Ganguly
TL;DR
The paper addresses the inefficiency of list-wise ranking with large language models under context-window limits, where sliding-window re-ranking is costly and often sequential. It introduces a pivot-based top-down partitioning algorithm that selects a high-ranked pivot from the top window and iteratively gathers top-k candidates within a fixed budget, enabling parallel scoring of partitions. Empirical results on MSMARCO and BEIR show that the proposed TDPart approach matches or surpasses sliding-window baselines while reducing inferences by up to approximately 33% at depth 100, with strongest gains when the first-stage retriever is reliable. The work highlights order-sensitivity in list-wise rankers, demonstrates the value of strong initial rankings, and suggests that TDPart can improve efficiency for both ranking and data annotation in large-scale retrieval systems, albeit with some domain-transfer challenges that warrant further robustness research.
Abstract
Large Language Models (LLMs) have significantly impacted many facets of natural language processing and information retrieval. Unlike previous encoder-based approaches, the enlarged context window of these generative models allows for ranking multiple documents at once, commonly called list-wise ranking. However, there are still limits to the number of documents that can be ranked in a single inference of the model, leading to the broad adoption of a sliding window approach to identify the k most relevant items in a ranked list. We argue that the sliding window approach is not well-suited for list-wise re-ranking because it (1) cannot be parallelized in its current form, (2) leads to redundant computational steps repeatedly re-scoring the best set of documents as it works its way up the initial ranking, and (3) prioritizes the lowest-ranked documents for scoring rather than the highest-ranked documents by taking a bottom-up approach. Motivated by these shortcomings and an initial study that shows list-wise rankers are biased towards relevant documents at the start of their context window, we propose a novel algorithm that partitions a ranking to depth k and processes documents top-down. Unlike sliding window approaches, our algorithm is inherently parallelizable due to the use of a pivot element, which can be compared to documents down to an arbitrary depth concurrently. In doing so, we reduce the number of expected inference calls by around 33% when ranking at depth 100 while matching the performance of prior approaches across multiple strong re-rankers.
