Breaking the Lens of the Telescope: Online Relevance Estimation over Large Retrieval Sets
Mandeep Rathee, V Venktesh, Sean MacAvaney, Avishek Anand
TL;DR
This work addresses the recall limitations of traditional telescoping retrieval by introducing Online Relevance Estimation (ORE), an online, bandit-based re-ranking framework that continuously updates relevance scores for a large pool of candidates. By learning a linear EstRel function over a compact feature set and strategically selecting batches of documents to score with expensive rankers, ORE balances exploration and exploitation to recover relevant documents that early-stage retrievers might miss. Empirical results on MSMARCO and TREC DL datasets show substantial recall improvements and notable efficiency gains across hybrid and adaptive retrieval scenarios, with ORE achieving up to 2–9× speedups depending on ranker cost. The approach integrates seamlessly with existing retrieval stacks, offering a principled, data-efficient means to bridge retrieval and ranking while mitigating the recall-precision tradeoffs inherent in telescoping pipelines.
Abstract
Advanced relevance models, such as those that use large language models (LLMs), provide highly accurate relevance estimations. However, their computational costs make them infeasible for processing large document corpora. To address this, retrieval systems often employ a telescoping approach, where computationally efficient but less precise lexical and semantic retrievers filter potential candidates for further ranking. However, this approach heavily depends on the quality of early-stage retrieval, which can potentially exclude relevant documents early in the process. In this work, we propose a novel paradigm for re-ranking called online relevance estimation that continuously updates relevance estimates for a query throughout the ranking process. Instead of re-ranking a fixed set of top-k documents in a single step, online relevance estimation iteratively re-scores smaller subsets of the most promising documents while adjusting relevance scores for the remaining pool based on the estimations from the final model using an online bandit-based algorithm. This dynamic process mitigates the recall limitations of telescoping systems by re-prioritizing documents initially deemed less relevant by earlier stages -- including those completely excluded by earlier-stage retrievers. We validate our approach on TREC benchmarks under two scenarios: hybrid retrieval and adaptive retrieval. Experimental results demonstrate that our method is sample-efficient and significantly improves recall, highlighting the effectiveness of our online relevance estimation framework for modern search systems.
