Table of Contents
Fetching ...

AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking

Soyoung Yoon, Gyuwan Kim, Gyu-Hwung Cho, Seung-won Hwang

TL;DR

AcuRank tackles the high cost of LLM-based listwise reranking under context constraints by introducing uncertainty-aware adaptive computation, guided by a Bayesian TrueSkill model to maintain probabilistic document relevance.The method iteratively refines uncertain documents: initializing with first-stage scores, estimating top-k probabilities via a thresholded latent-score mechanism, and selectively reranking only ambiguous candidates until confidence stabilizes.Empirical results on TREC-DL and BEIR show AcuRank achieves superior accuracy-efficiency trade-offs across diverse retrievers and rerankers, with scalable compute and robust generalization to out-of-domain settings.The framework provides a flexible anytime approach, allowing practitioners to trade off accuracy and compute, while offering avenues for future work in richer uncertainty signals and reasoning-aware retrieval.

Abstract

Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.

AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking

TL;DR

AcuRank tackles the high cost of LLM-based listwise reranking under context constraints by introducing uncertainty-aware adaptive computation, guided by a Bayesian TrueSkill model to maintain probabilistic document relevance.The method iteratively refines uncertain documents: initializing with first-stage scores, estimating top-k probabilities via a thresholded latent-score mechanism, and selectively reranking only ambiguous candidates until confidence stabilizes.Empirical results on TREC-DL and BEIR show AcuRank achieves superior accuracy-efficiency trade-offs across diverse retrievers and rerankers, with scalable compute and robust generalization to out-of-domain settings.The framework provides a flexible anytime approach, allowing practitioners to trade off accuracy and compute, while offering avenues for future work in richer uncertainty signals and reasoning-aware retrieval.

Abstract

Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.

Paper Structure

This paper contains 54 sections, 4 equations, 3 figures, 18 tables.

Figures (3)

  • Figure 1: Overview of AcuRank. (a): Each retrieved document's relevance is initialized as a Gaussian distribution using a TrueSkill-based model, with its mean and variance representing estimated relevance and uncertainty. (b): We estimate the probability of a document being in the top-$k$ as the chance its score exceeds a threshold such that the expected number of documents above it equals $k$. Documents with uncertain rankings are selected and reranked in groups, and their relevance estimates are updated. (c): The process repeats until stopping criteria are met. The final ranking is based on the updated relevance estimates.
  • Figure 2: Pareto curves showing the trade-off between accuracy (NDCG@10) and efficiency (# of reranker calls) across reranking methods.
  • Figure 3: Correlation between WIG-based query difficulty and # reranker calls (Section. \ref{['sec:adaptive_allocation']})