Table of Contents
Fetching ...

Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model

Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, Pabitra Mitra

TL;DR

This work proposes a pairwise few-shot ranker that demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks.

Abstract

A supervised ranking model, despite its advantage of being effective, usually involves complex processing - typically multiple stages of task-specific pre-training and fine-tuning. This has motivated researchers to explore simpler pipelines leveraging large language models (LLMs) that are capable of working in a zero-shot manner. However, since zero-shot inference does not make use of a training set of pairs of queries and their relevant documents, its performance is mostly worse than that of supervised models, which are trained on such example pairs. Motivated by the existing findings that training examples generally improve zero-shot performance, in our work, we explore if this also applies to ranking models. More specifically, given a query and a pair of documents, the preference prediction task is improved by augmenting examples of preferences for similar queries from a training set. Our proposed pairwise few-shot ranker demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.

Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model

TL;DR

This work proposes a pairwise few-shot ranker that demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks.

Abstract

A supervised ranking model, despite its advantage of being effective, usually involves complex processing - typically multiple stages of task-specific pre-training and fine-tuning. This has motivated researchers to explore simpler pipelines leveraging large language models (LLMs) that are capable of working in a zero-shot manner. However, since zero-shot inference does not make use of a training set of pairs of queries and their relevant documents, its performance is mostly worse than that of supervised models, which are trained on such example pairs. Motivated by the existing findings that training examples generally improve zero-shot performance, in our work, we explore if this also applies to ranking models. More specifically, given a query and a pair of documents, the preference prediction task is improved by augmenting examples of preferences for similar queries from a training set. Our proposed pairwise few-shot ranker demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.
Paper Structure (33 sections, 6 equations, 4 figures, 7 tables)

This paper contains 33 sections, 6 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Our proposed pairwise method for reranking a set of top-retrieved candidate documents via LLM-based inference. Different from qin2023large, we provide additional context for LLM inference by including few-shot examples, each consisting of documents relevant to queries similar to the current input query as retrieved from a training set.
  • Figure 2: An example prompt to illustrate the structure of the prompts used for few-shot PRP.
  • Figure 3: Sensitivity of Zephyr-$k$S on #few-shot examples.
  • Figure 4: Per-query analysis showing the relation between Jaccard similarity (JS) of current query and 1-shot example with the $\Delta$nDCG@10 relative to 0S with Zephyr-1S using the semantic and lexical neighborhoods for in-domain and out-domain test sets. The Pearson correlation ($\rho$) is shown in each case.