Table of Contents
Fetching ...

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

Weiwei Sun, Zheng Chen, Xinyu Ma, Lingyong Yan, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren

TL;DR

The paper tackles the inefficiency and instability of LLM-based zero-shot relevance ranking, particularly for pairwise and listwise prompting. It introduces Instruction Distillation to transfer the pairwise ranking capability of an open-source LLM into a more efficient pointwise prompting regime. The three-stage pipeline—BM25 candidate generation, teacher inference with pairwise ranking, and student learning via RankNet loss—achieves 10–100× speedups while attaining competitive or superior ranking performance on BEIR, TREC, and Redial tasks. The approach outperforms supervised monoT5 baselines and remains competitive with state-of-the-art zero-shot methods, with reproducible results using FLAN-T5 family models.

Abstract

Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero-shot relevance rankers. The typical approach involves making comparisons between pairs or lists of documents. Although effective, these listwise and pairwise methods are not efficient and also heavily rely on intricate prompt engineering. To tackle this problem, we introduce a novel instruction distillation method. The key idea is to distill the pairwise ranking ability of open-sourced LLMs to a simpler but more efficient pointwise ranking. Specifically, given the same LLM, we first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Evaluation results on the BEIR, TREC, and ReDial datasets demonstrate that instruction distillation can improve efficiency by 10 to 100x and also enhance the ranking performance of LLMs. Furthermore, our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

TL;DR

The paper tackles the inefficiency and instability of LLM-based zero-shot relevance ranking, particularly for pairwise and listwise prompting. It introduces Instruction Distillation to transfer the pairwise ranking capability of an open-source LLM into a more efficient pointwise prompting regime. The three-stage pipeline—BM25 candidate generation, teacher inference with pairwise ranking, and student learning via RankNet loss—achieves 10–100× speedups while attaining competitive or superior ranking performance on BEIR, TREC, and Redial tasks. The approach outperforms supervised monoT5 baselines and remains competitive with state-of-the-art zero-shot methods, with reproducible results using FLAN-T5 family models.

Abstract

Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero-shot relevance rankers. The typical approach involves making comparisons between pairs or lists of documents. Although effective, these listwise and pairwise methods are not efficient and also heavily rely on intricate prompt engineering. To tackle this problem, we introduce a novel instruction distillation method. The key idea is to distill the pairwise ranking ability of open-sourced LLMs to a simpler but more efficient pointwise ranking. Specifically, given the same LLM, we first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Evaluation results on the BEIR, TREC, and ReDial datasets demonstrate that instruction distillation can improve efficiency by 10 to 100x and also enhance the ranking performance of LLMs. Furthermore, our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.
Paper Structure (21 sections, 6 equations, 3 figures, 4 tables)

This paper contains 21 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The average nDCG@10 of various LLM-based re-ranking methods on TREC benchmarks. The horizontal axis represents the speed of each method relative to monoT5-Base Nogueira2020DocumentRW, as measured by the average latency time per query. All methods are based on the T5 series foundation models. RG refers to the relevance generation method, and PRP refers to the pairwise ranking method.
  • Figure 2: An overview of the proposed instruction distillation approach. Instruction distillation distills the abilities harvested from complex instruction techniques into a model that is more efficient with simple instruction techniques.
  • Figure 3: Compare the proposed method with baselines in terms of model size. We can see that our methods (denoted by yellow line) outperform supervised finetuning (SFT) methods when the number of parameters exceeds 3B.