Table of Contents
Fetching ...

ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval

Soyoung Yoon, Eunbi Choi, Jiyeon Kim, Hyeongu Yun, Yireun Kim, Seung-won Hwang

TL;DR

ListT5 introduces Fusion-in-Decoder based listwise reranking to jointly evaluate multiple passages and produce a sorted permutation, addressing zero-shot retrieval challenges. It extends a basic $m \to r$ unit to full $n \to k$ reranking using an $m$-ary tournament tree, achieving $O(n + k \log_m n)$ complexity and improving efficiency over prior listwise and pairwise methods. Empirically, ListT5 outperforms RankT5 on BEIR by around $+1.3$ NDCG@10 on BM25 Top-100 and demonstrates robustness to positional bias, while maintaining competitive FLOPs with pointwise baselines. The work provides extensive ablations and stability analyses, showing that generating passages in increasing relevance and using tournament-based inference contribute to zero-shot performance gains and practical usability; code and models are open-sourced for reproducibility and further research.

Abstract

We propose ListT5, a novel reranking approach based on Fusion-in-Decoder (FiD) that handles multiple candidate passages at both train and inference time. We also introduce an efficient inference framework for listwise ranking based on m-ary tournament sort with output caching. We evaluate and compare our model on the BEIR benchmark for zero-shot retrieval task, demonstrating that ListT5 (1) outperforms the state-of-the-art RankT5 baseline with a notable +1.3 gain in the average NDCG@10 score, (2) has an efficiency comparable to pointwise ranking models and surpasses the efficiency of previous listwise ranking models, and (3) overcomes the lost-in-the-middle problem of previous listwise rerankers. Our code, model checkpoints, and the evaluation framework are fully open-sourced at \url{https://github.com/soyoung97/ListT5}.

ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval

TL;DR

ListT5 introduces Fusion-in-Decoder based listwise reranking to jointly evaluate multiple passages and produce a sorted permutation, addressing zero-shot retrieval challenges. It extends a basic unit to full reranking using an -ary tournament tree, achieving complexity and improving efficiency over prior listwise and pairwise methods. Empirically, ListT5 outperforms RankT5 on BEIR by around NDCG@10 on BM25 Top-100 and demonstrates robustness to positional bias, while maintaining competitive FLOPs with pointwise baselines. The work provides extensive ablations and stability analyses, showing that generating passages in increasing relevance and using tournament-based inference contribute to zero-shot performance gains and practical usability; code and models are open-sourced for reproducibility and further research.

Abstract

We propose ListT5, a novel reranking approach based on Fusion-in-Decoder (FiD) that handles multiple candidate passages at both train and inference time. We also introduce an efficient inference framework for listwise ranking based on m-ary tournament sort with output caching. We evaluate and compare our model on the BEIR benchmark for zero-shot retrieval task, demonstrating that ListT5 (1) outperforms the state-of-the-art RankT5 baseline with a notable +1.3 gain in the average NDCG@10 score, (2) has an efficiency comparable to pointwise ranking models and surpasses the efficiency of previous listwise ranking models, and (3) overcomes the lost-in-the-middle problem of previous listwise rerankers. Our code, model checkpoints, and the evaluation framework are fully open-sourced at \url{https://github.com/soyoung97/ListT5}.
Paper Structure (61 sections, 2 equations, 9 figures, 17 tables)

This paper contains 61 sections, 2 equations, 9 figures, 17 tables.

Figures (9)

  • Figure 1: Operating unit of ListT5. ListT5 jointly considers multiple (5) candidate passages at once using FiD, each concatenated with the query and an identifier. The output is an ordered list of the identifiers (numbers) where the most relevant passage comes at the last.
  • Figure 2: Two variants of ListT5, ($r$=1) and ($r$=2). The underlying model is identical and only the inference method varies. ($r$=1) keeps only the top 1 relevant index from the output, and ($r$=2) keeps top 2 relevant indices.
  • Figure 3: Illustration of our inference framework using $m$-ary tournament sort, with ListT5 ($r$ = 1) as the basic unit. Given $n$ candidates, we can order top-$k$ most relevant passages in $O(n+k\log{n})$ asymptotic complexity. We can use either ($r$ = 1) or ($r$ = 2) for the basic unit, but the uppermost unit always outputs 1 ($r$ = 1). We fix $m$ to 5 in our experiments. Full illustration at Appendix Fig.\ref{['fig:fig_appendix_inference']}.
  • Figure 4: Real-time FLOPs comparison of the models on T5-base, including DuoT5 and the sliding window variants of ListT5. The reported BEIR performance is averaged from a subset of BEIR, same as in Tab. \ref{['table/rankgpt_duot5']}.
  • Figure 5: Measuring the robustness to positional bias by shuffling the index of the relevant passage. 4 is the gold (ground truth) relevant passage in the figure.
  • ...and 4 more figures