PyTerrier-GenRank: The PyTerrier Plugin for Reranking with Large Language Models
Kaustubh D. Dhole
TL;DR
Problem: LLM-based reranking requires extensive hyperparameter exploration across prompts, models, and reformulation strategies in IR pipelines. Approach: PyTerrier-GenRank provides a PyTerrier plugin that wraps RankLLM-style rerankers into modular pipelines, supporting pointwise, pairwise, and listwise prompting, with sample code for HuggingFace and OpenAI endpoints. Contributions: an installable, Python-friendly interface that enables reproducible experiments and rapid comparisons across diverse hyperparameters. Findings: sanity checks re-implement RankVicuna and RankGPT and show LLama-Spark as the most effective 8B zero-shot reranker in the tested setup, with discussion on ethics.
Abstract
Using LLMs as rerankers requires experimenting with various hyperparameters, such as prompt formats, model choice, and reformulation strategies. We introduce PyTerrier-GenRank, a PyTerrier plugin to facilitate seamless reranking experiments with LLMs, supporting popular ranking strategies like pointwise and listwise prompting. We validate our plugin through HuggingFace and OpenAI hosted endpoints.
