LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking
Qi Liu, Haozhe Duan, Yiqun Chen, Quanfeng Lu, Weiwei Sun, Jiaxin Mao
TL;DR
LLM4Ranking presents a modular, easy-to-use toolkit that unifies access to both open-source and API-based LLMs for document reranking. By decoupling LLM interfaces, ranking logic, and model implementations, it supports pointwise, pairwise, and listwise paradigms, along with generation, log-likelihood, and logits-based methods, and provides training and evaluation pipelines. The framework demonstrates zero-shot and supervised reranking experiments across multiple datasets, highlighting the practical effectiveness of API-backed models like RankGPT and TourRank, as well as the value of fine-tuning smaller models. This toolkit facilitates reproducible research and practical deployment for retrieval-augmented generation and other IR tasks, with broad applicability across academic and real-world settings.
Abstract
Utilizing large language models (LLMs) for document reranking has been a popular and promising research direction in recent years, many studies are dedicated to improving the performance and efficiency of using LLMs for reranking. Besides, it can also be applied in many real-world applications, such as search engines or retrieval-augmented generation. In response to the growing demand for research and application in practice, we introduce a unified framework, \textbf{LLM4Ranking}, which enables users to adopt different ranking methods using open-source or closed-source API-based LLMs. Our framework provides a simple and extensible interface for document reranking with LLMs, as well as easy-to-use evaluation and fine-tuning scripts for this task. We conducted experiments based on this framework and evaluated various models and methods on several widely used datasets, providing reproducibility results on utilizing LLMs for document reranking. Our code is publicly available at https://github.com/liuqi6777/llm4ranking.
