Table of Contents
Fetching ...

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

Yiqun Chen, Qi Liu, Yi Zhang, Weiwei Sun, Xinyu Ma, Wei Yang, Daiting Shi, Jiaxin Mao, Dawei Yin

TL;DR

This work tackles zero-shot document ranking with large language models by addressing three core challenges: input-length constraints, sensitivity to input order, and the cost-efficiency of ranking. It introduces TourRank, a tournament-inspired framework that performs multi-stage groupings and parallel tournaments, accumulating points across rounds to form a robust final ranking. Through extensive experiments on TREC DL and BEIR, TourRank achieves state-of-the-art or competitive results with favorable cost and latency, across both commercial APIs and open-source LLMs, and demonstrates robustness to initial retrieval order and retrieval models. The approach offers a scalable, parallelizable alternative to existing listwise methods, enabling effective zero-shot ranking in practical IR settings.

Abstract

Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank, which is inspired by the sport tournaments, such as FIFA World Cup. Specifically, we 1) overcome the limitation in input length and reduce the ranking latency by incorporating a multi-stage grouping strategy similar to the parallel group stage of sport tournaments; 2) improve the ranking performance and robustness to input orders by using a points system to ensemble multiple ranking results. We test TourRank with different LLMs on the TREC DL datasets and the BEIR benchmark. The experimental results demonstrate that TourRank delivers state-of-the-art performance at a modest cost. The code of TourRank can be seen on https://github.com/chenyiqun/TourRank.

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

TL;DR

This work tackles zero-shot document ranking with large language models by addressing three core challenges: input-length constraints, sensitivity to input order, and the cost-efficiency of ranking. It introduces TourRank, a tournament-inspired framework that performs multi-stage groupings and parallel tournaments, accumulating points across rounds to form a robust final ranking. Through extensive experiments on TREC DL and BEIR, TourRank achieves state-of-the-art or competitive results with favorable cost and latency, across both commercial APIs and open-source LLMs, and demonstrates robustness to initial retrieval order and retrieval models. The approach offers a scalable, parallelizable alternative to existing listwise methods, enabling effective zero-shot ranking in practical IR settings.

Abstract

Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank, which is inspired by the sport tournaments, such as FIFA World Cup. Specifically, we 1) overcome the limitation in input length and reduce the ranking latency by incorporating a multi-stage grouping strategy similar to the parallel group stage of sport tournaments; 2) improve the ranking performance and robustness to input orders by using a points system to ensemble multiple ranking results. We test TourRank with different LLMs on the TREC DL datasets and the BEIR benchmark. The experimental results demonstrate that TourRank delivers state-of-the-art performance at a modest cost. The code of TourRank can be seen on https://github.com/chenyiqun/TourRank.
Paper Structure (31 sections, 7 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 31 sections, 7 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: The 1982 FIFA World Cup. In the first group stage, 24 teams were divided into six groups, and the top 2 out of 4 teams in each group qualified. In the second group stage, 12 teams were divided into 4 groups, and only the top 1 out of 3 teams in each group advanced. In knockout stages, only the winner in each two-team match progressed to the next stage.
  • Figure 2: (a) A basic tournament that selects the $N_k$ most relevant documents from $N_1$ candidates with $K$ stages. $P_{T_r}$ is the points vector for all candidates obtained in the $K$ stages. (b) The grouping strategy in the selection stage of the tournament.
  • Figure 3: Get the accumulated points of all candidate documents through $R$ tournaments.
  • Figure 4: The sensitivity analysis to initial ranking of TourRank and RankGPT on TREC DL 19 and TREC DL 20.
  • Figure 5: Relationship between Cost / Latency and NDCG@10 on TREC DL 19.
  • ...and 3 more figures