Table of Contents
Fetching ...

Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models

Jieran Li, Xiuyuan Hu, Yang Zhao, Shengyao Zhuang, Hao Zhang

TL;DR

This work tackles the efficiency-accuracy tension in zero-shot LLM reranking by introducing RefRank, an anchor-based relative scoring framework. By treating the top-ranked retrieval as a shared anchor, RefRank converts pointwise scoring into a linear-time, batchable contrastive process that preserves the benefits of inter-document comparison without quadratic costs. The authors present two variants—RefRank-Single and RefRank-Multiple—and demonstrate state-of-the-art or near-state-of-the-art NDCG@10 across six datasets and multiple LLM backbones, with substantial latency and memory advantages over traditional pairwise and listwise methods. The results suggest RefRank as a practical drop-in improvement for production reranking that leverages upstream retrieval signals more effectively than prior approaches.

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance in the task of text ranking for information retrieval. While Pointwise ranking approaches offer computational efficiency by scoring documents independently, they often yield biased relevance estimates due to the lack of inter-document comparisons. In contrast, Pairwise methods improve ranking accuracy by explicitly comparing document pairs, but suffer from substantial computational overhead with quadratic complexity ($O(n^2)$). To address this tradeoff, we propose \textbf{RefRank}, a simple and effective comparative ranking method based on a fixed reference document. Instead of comparing all document pairs, RefRank prompts the LLM to evaluate each candidate relative to a shared reference anchor. By selecting the reference anchor that encapsulates the core query intent, RefRank implicitly captures relevance cues, enabling indirect comparison between documents via this common anchor. This reduces computational cost to linear time ($O(n)$) while importantly, preserving the advantages of comparative evaluation. To further enhance robustness, we aggregate multiple RefRank outputs using a weighted averaging scheme across different reference choices. Experiments on several benchmark datasets and with various LLMs show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches with a significantly lower computational cost.

Leveraging Reference Documents for Zero-Shot Ranking via Large Language Models

TL;DR

This work tackles the efficiency-accuracy tension in zero-shot LLM reranking by introducing RefRank, an anchor-based relative scoring framework. By treating the top-ranked retrieval as a shared anchor, RefRank converts pointwise scoring into a linear-time, batchable contrastive process that preserves the benefits of inter-document comparison without quadratic costs. The authors present two variants—RefRank-Single and RefRank-Multiple—and demonstrate state-of-the-art or near-state-of-the-art NDCG@10 across six datasets and multiple LLM backbones, with substantial latency and memory advantages over traditional pairwise and listwise methods. The results suggest RefRank as a practical drop-in improvement for production reranking that leverages upstream retrieval signals more effectively than prior approaches.

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance in the task of text ranking for information retrieval. While Pointwise ranking approaches offer computational efficiency by scoring documents independently, they often yield biased relevance estimates due to the lack of inter-document comparisons. In contrast, Pairwise methods improve ranking accuracy by explicitly comparing document pairs, but suffer from substantial computational overhead with quadratic complexity (). To address this tradeoff, we propose \textbf{RefRank}, a simple and effective comparative ranking method based on a fixed reference document. Instead of comparing all document pairs, RefRank prompts the LLM to evaluate each candidate relative to a shared reference anchor. By selecting the reference anchor that encapsulates the core query intent, RefRank implicitly captures relevance cues, enabling indirect comparison between documents via this common anchor. This reduces computational cost to linear time () while importantly, preserving the advantages of comparative evaluation. To further enhance robustness, we aggregate multiple RefRank outputs using a weighted averaging scheme across different reference choices. Experiments on several benchmark datasets and with various LLMs show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches with a significantly lower computational cost.

Paper Structure

This paper contains 19 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Paradigms of zero-shot reranking: (a) Individual-scoring, (b) Comparative-sorting, and (c) Our RefRank.
  • Figure 2: Illustration of RefRank's two inference schemes. (a) RefRank-Single: each candidate is compared against one fixed reference (e.g.,top-1) and ranked by the resulting relative score. (b) RefRank-Multiple: each candidate is scored against top-$k$ references and the final score is obtained by mean pooling, yielding smoother and more robust rankings.
  • Figure 3: NDCG@10 versus selected reference passage index for two models.
  • Figure 4: Correlation between the fraction of reliable reference passages and NDCG@10. The fitted regression line indicates a strong positive relationship.
  • Figure 5: NDCG@10 versus the number of top-$k$ reference documents sequentially weighted.
  • ...and 2 more figures