FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Revanth Gangi Reddy; JaeHyeok Doo; Yifei Xu; Md Arafat Sultan; Deevya Swain; Avirup Sil; Heng Ji

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji

TL;DR

The paper addresses the inefficiency of traditional listwise LLM rerankers that generate full ranking sequences by proposing FIRST, which ranks candidates from the logits of the first generated identifier. It introduces a joint learning-to-rank objective $L_{Joint} = L_{LM} + \lambda L_{Rank}$, combining the standard LM loss with a RankNet-style loss to prioritize top-ranked passages. Empirically, FIRST achieves about a 50% reduction in inference latency and delivers competitive BEIR results, with ablations showing the benefit of the ranking loss and weighting strategies. It also demonstrates that relevance feedback using an LLM reranker can provide superior distillation signals for improving retriever recall compared to cross-encoders.

Abstract

Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approaches. However, conventional listwise LLM reranking methods lack efficiency as they provide ranking output in the form of a generated ordered sequence of candidate passage identifiers. Further, they are trained with the typical language modeling objective, which treats all ranking errors uniformly--potentially at the cost of misranking highly relevant passages. Addressing these limitations, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Further, we incorporate a learning-to-rank loss during training, prioritizing ranking accuracy for the more relevant passages. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Finally, to illustrate the practical effectiveness of listwise LLM rerankers, we investigate their application in providing relevance feedback for retrievers during inference. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

TL;DR

, combining the standard LM loss with a RankNet-style loss to prioritize top-ranked passages. Empirically, FIRST achieves about a 50% reduction in inference latency and delivers competitive BEIR results, with ablations showing the benefit of the ranking loss and weighting strategies. It also demonstrates that relevance feedback using an LLM reranker can provide superior distillation signals for improving retriever recall compared to cross-encoders.

Abstract

Paper Structure (18 sections, 3 equations, 4 figures, 4 tables)

This paper contains 18 sections, 3 equations, 4 figures, 4 tables.

Introduction
Related Work
Reranking with LLMs
Learning to Rank
Listwise Reranking
Methodology
Listwise Reranking with LLMs
FIRST: Ranking with a Single Token
Experiments
Setup
Model:
Datasets:
Reranking Setup:
Baselines:
Ranking Performance
...and 3 more sections

Figures (4)

Figure 1: FIRST (b) directly ranks candidates using the output vocabulary logits for the first generated identifier, as opposed to the generation approach (a) of generating the entire ordered sequence. A learning-to-rank loss is incorporated during training to provide supervision to the model for ranking using single-token decoding.
Figure 2: The $\%$ of times the rank generated by an LLM reranker (RankZephyr pradeep2023rankzephyr) for a candidate agrees with the rank implied by its computed logit for the same candidate in the first (top-rank) token position, at different ranks. RankZephyr, originally fine-tuned with a sequence generation objective (in blue), shows a considerably higher similarity between the two above rankings than a pretrained LLM (in red).
Figure 3: Ranking accuracy (nDCG@10) against the reranker's per query latency in seconds. $k$ refers to the number of passages reranked for the corresponding latency. FIRST considerably outperforms sequence generation when constrained to a latency budget, as it is able to rerank significantly more candidates.
Figure 4: Plot comparing the single window inference latency for FIRST vs. generating the ranked sequence, for different numbers of candidate passages $m$.

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

TL;DR

Abstract

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (4)