Table of Contents
Fetching ...

RankLLM: A Python Package for Reranking with LLMs

Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, Jimmy Lin

TL;DR

Rank-LLM tackles the fragmentation in LLM-based reranking by delivering a modular, open-source Python package that supports pointwise, pairwise, and listwise reranking with a broad set of LLMs. It integrates retrieval (via Pyserini), evaluation, analysis, and training to provide end-to-end, reproducible workflows, including 2CR reproducibility pages. The framework handles large candidate lists through a sliding window approach and enforces robust post-processing of model outputs, with configurable prompt templates and diverse coordinators (Mono-T5, Duo-T5, LiT5, SafeOpenai, SafeGenai, vLLM-based OSLLM, etc.). This work enables rapid experimentation and benchmarking in retrieval-augmented pipelines, promotes transparency and replicability, and supports end-to-end deployment through integration with LangChain, LlamaIndex, and popular inference backends.

Abstract

The adoption of large language models (LLMs) as rerankers in multi-stage retrieval systems has gained significant traction in academia and industry. These models refine a candidate list of retrieved documents, often through carefully designed prompts, and are typically used in applications built on retrieval-augmented generation (RAG). This paper introduces RankLLM, an open-source Python package for reranking that is modular, highly configurable, and supports both proprietary and open-source LLMs in customized reranking workflows. To improve usability, RankLLM features optional integration with Pyserini for retrieval and provides integrated evaluation for multi-stage pipelines. Additionally, RankLLM includes a module for detailed analysis of input prompts and LLM responses, addressing reliability concerns with LLM APIs and non-deterministic behavior in Mixture-of-Experts (MoE) models. This paper presents the architecture of RankLLM, along with a detailed step-by-step guide and sample code. We reproduce results from RankGPT, LRL, RankVicuna, RankZephyr, and other recent models. RankLLM integrates with common inference frameworks and a wide range of LLMs. This compatibility allows for quick reproduction of reported results, helping to speed up both research and real-world applications. The complete repository is available at rankllm.ai, and the package can be installed via PyPI.

RankLLM: A Python Package for Reranking with LLMs

TL;DR

Rank-LLM tackles the fragmentation in LLM-based reranking by delivering a modular, open-source Python package that supports pointwise, pairwise, and listwise reranking with a broad set of LLMs. It integrates retrieval (via Pyserini), evaluation, analysis, and training to provide end-to-end, reproducible workflows, including 2CR reproducibility pages. The framework handles large candidate lists through a sliding window approach and enforces robust post-processing of model outputs, with configurable prompt templates and diverse coordinators (Mono-T5, Duo-T5, LiT5, SafeOpenai, SafeGenai, vLLM-based OSLLM, etc.). This work enables rapid experimentation and benchmarking in retrieval-augmented pipelines, promotes transparency and replicability, and supports end-to-end deployment through integration with LangChain, LlamaIndex, and popular inference backends.

Abstract

The adoption of large language models (LLMs) as rerankers in multi-stage retrieval systems has gained significant traction in academia and industry. These models refine a candidate list of retrieved documents, often through carefully designed prompts, and are typically used in applications built on retrieval-augmented generation (RAG). This paper introduces RankLLM, an open-source Python package for reranking that is modular, highly configurable, and supports both proprietary and open-source LLMs in customized reranking workflows. To improve usability, RankLLM features optional integration with Pyserini for retrieval and provides integrated evaluation for multi-stage pipelines. Additionally, RankLLM includes a module for detailed analysis of input prompts and LLM responses, addressing reliability concerns with LLM APIs and non-deterministic behavior in Mixture-of-Experts (MoE) models. This paper presents the architecture of RankLLM, along with a detailed step-by-step guide and sample code. We reproduce results from RankGPT, LRL, RankVicuna, RankZephyr, and other recent models. RankLLM integrates with common inference frameworks and a wide range of LLMs. This compatibility allows for quick reproduction of reported results, helping to speed up both research and real-world applications. The complete repository is available at rankllm.ai, and the package can be installed via PyPI.

Paper Structure

This paper contains 29 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overview of RankLLM with the Reranker component at the center. Other components facilitating optional flows include retrieval with Pyserini, evaluation, invocations analysis, and model training.
  • Figure 2: Three reranking methods applied to a Query Q and a list of $k$ Candidates Ci: (a) pointwise reranking, (b) pairwise reranking, and (c) listwise reranking with a sliding window of size four and a stride of two.
  • Figure 3: Data classes for reranking requests and results throughout the Rank-LLM pipeline. After reranking, the invocations history is also included in results.
  • Figure 4: Sample requests creation either inline (lines 5--22) or via loading them from a JSONL file (lines 25--28).
  • Figure 5: Sample requests creation via Pyserini retrieval.
  • ...and 4 more figures