Ranked List Truncation for Large Language Model-based Re-Ranking

Chuan Meng; Negar Arabzadeh; Arian Askari; Mohammad Aliannejadi; Maarten de Rijke

Ranked List Truncation for Large Language Model-based Re-Ranking

Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke

TL;DR

This paper investigates ranked list truncation (RLT) in a retrieve-then-re-rank pipeline, focusing on LLM-based re-ranking. By reproducing eight RLT methods across lexical, sparse, and dense retrievers and two re-rankers on TREC-DL 19/20, it reveals that many findings from retrieval settings do not generalize to re-ranking contexts; fixed re-ranking depths often match or outperform supervised RLT under several conditions, and the retriever type substantially impacts RLT effectiveness/efficiency. Distribution-based supervised methods generally outperform sequential labeling methods in balanced scenarios, though benefits are not universal, and data scarcity can hamper learning. The work provides practical insights for deploying RLT in re-ranking, highlights the importance of retriever choice, and offers open-source code and data to foster reproducibility and further research.

Abstract

We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list (i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis. It also has the potential to improve re-ranking effectiveness. Despite its importance, there is limited research into applying RLT methods to this new perspective. To address this research gap, we reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. In particular, we examine to what extent established findings on RLT for retrieval are generalizable to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. We perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. We reach new insights into RLT methods in the context of re-ranking.

Ranked List Truncation for Large Language Model-based Re-Ranking

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 9 figures, 4 tables)

This paper contains 13 sections, 5 equations, 9 figures, 4 tables.

Introduction
Motivation
Preliminaries and task definition
Reproducibility Methodology
Research questions and experimental design
Experimental setup
Results and discussions
RLT for LLM-based re-ranking
The impact of retriever types on RLT
RLT for pre-trained LM-based re-ranking
Error analysis
Related Work
Conclusions & Future Work

Figures (9)

Figure 1: A schematic diagram of RLT in the "retrieve-then-re-rank" setup.
Figure 2: Cumulative distribution function of oracle cut-offs for RepLLaMA--RankLLaMA (a, b) and BM25--RankLLaMA (c, d) on TREC-DL 19 and 20. The oracle cut-offs are the minimum re-ranking cut-offs that yield the highest nDCG@10 values.
Figure 3: nDCG@10 values for RepLLaMA--RankLLaMA (a, b) and BM25--RankLLaMA (c, d) w.r.t. re-ranking cut-offs on TREC-DL 19 and 20.
Figure 4: Average EET values across TREC-DL 20 queries w.r.t. re-ranking cut-offs. We use nDCG@10 in effectiveness $\sigma$. $\beta$ values 0, 1 and 2 represent prioritizing effectiveness, balancing effectiveness and efficiency, and emphasizing efficiency, respectively.
Figure 5: A comparison of RLT methods in predicting re-ranking cut-off points for BM25--RankLLaMA on TREC-DL 20. $\beta$=0 ($\eta$=0.4), $\beta$=1 ($\eta$=0.5), and $\beta$=2 ($\eta$=0.6) represent effectiveness emphasis, balance, and efficiency emphasis, respectively.
...and 4 more figures

Ranked List Truncation for Large Language Model-based Re-Ranking

TL;DR

Abstract

Ranked List Truncation for Large Language Model-based Re-Ranking

Authors

TL;DR

Abstract

Table of Contents

Figures (9)