Ranked List Truncation for Large Language Model-based Re-Ranking
Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke
TL;DR
This paper investigates ranked list truncation (RLT) in a retrieve-then-re-rank pipeline, focusing on LLM-based re-ranking. By reproducing eight RLT methods across lexical, sparse, and dense retrievers and two re-rankers on TREC-DL 19/20, it reveals that many findings from retrieval settings do not generalize to re-ranking contexts; fixed re-ranking depths often match or outperform supervised RLT under several conditions, and the retriever type substantially impacts RLT effectiveness/efficiency. Distribution-based supervised methods generally outperform sequential labeling methods in balanced scenarios, though benefits are not universal, and data scarcity can hamper learning. The work provides practical insights for deploying RLT in re-ranking, highlights the importance of retriever choice, and offers open-source code and data to foster reproducibility and further research.
Abstract
We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list (i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis. It also has the potential to improve re-ranking effectiveness. Despite its importance, there is limited research into applying RLT methods to this new perspective. To address this research gap, we reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. In particular, we examine to what extent established findings on RLT for retrieval are generalizable to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. We perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. We reach new insights into RLT methods in the context of re-ranking.
