Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Ranking
Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen
TL;DR
Rank-DistiLLM addresses the gap between distilled cross-encoders and teacher LLMs for passage re-ranking by systematically analyzing fine-tuning practices, ranking depth, and data quality. It introduces the Rank-DistiLLM distillation dataset and a novel ADR-MSE listwise loss, demonstrating that cross-encoders fine-tuned on Rank-DistiLLM can match LLM performance while remaining orders of magnitude more efficient. The method combines MS MARCO-based pretraining with high-quality LLM-distillation data (RankGPT+, RankZephyr) and carefully evaluates both in-domain and out-of-domain settings, outperforming prior distillation datasets like RankGPT and TWOLAR. The results indicate broad practical impact: producing high-accuracy re-ranking models that are feasible for production-scale search with significantly reduced compute and memory requirements. The work provides a valuable dataset and a scalable distillation recipe to bring LLM-level re-ranking performance closer to real-world deployment.
Abstract
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, distilled models do not match the effectiveness of their teacher LLMs. We hypothesize that this effectiveness gap is due to the fact that previous work has not applied the best-suited methods for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss functions). To close this gap, we create a new dataset, Rank-DistiLLM. Cross-encoders trained on Rank-DistiLLM achieve the effectiveness of LLMs while being up to 173 times faster and 24 times more memory efficient. Our code and data is available at https://github.com/webis-de/ECIR-25.
