Table of Contents
Fetching ...

ReFIT: Relevance Feedback from a Reranker during Inference

Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi

TL;DR

This work addresses the shortcoming of retrieve-and-rerank systems where Recall@K cannot be improved by reranking alone. It introduces ReFIT, an inference-time mechanism that distills the cross-encoder reranker’s relevance signals into a refreshed query representation for a second retrieval, achieving higher Recall@K while maintaining comparable latency. The approach is architecture- and modality-agnostic, showing gains across English domains, multilingual and cross-lingual settings, and multimodal retrieval, and it outperforms training-time distillation baselines and TouR in many scenarios. ReFIT enables practical, scalable improvements in retrieval effectiveness and opens avenues for integrating relevance feedback from larger models at inference time, including future work with LLM-based signals and interpretability of query updates.

Abstract

Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.

ReFIT: Relevance Feedback from a Reranker during Inference

TL;DR

This work addresses the shortcoming of retrieve-and-rerank systems where Recall@K cannot be improved by reranking alone. It introduces ReFIT, an inference-time mechanism that distills the cross-encoder reranker’s relevance signals into a refreshed query representation for a second retrieval, achieving higher Recall@K while maintaining comparable latency. The approach is architecture- and modality-agnostic, showing gains across English domains, multilingual and cross-lingual settings, and multimodal retrieval, and it outperforms training-time distillation baselines and TouR in many scenarios. ReFIT enables practical, scalable improvements in retrieval effectiveness and opens avenues for integrating relevance feedback from larger models at inference time, including future work with LLM-based signals and interpretability of query updates.

Abstract

Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.
Paper Structure (31 sections, 7 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 31 sections, 7 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: ReFIT: The proposed method for reranker relevance feedback. We introduce an inference-time distillation process (step 3) into the traditional retrieve-and-rerank framework (steps 1 and 2) to compute a new query vector, which improves recall when used for a second retrieval step (step 4).
  • Figure 2: t-SNE plots for some examples from BEIR, with the query vectors shown alongside the corresponding positive passages. The updated query vectors after ReFIT are now closer to the positive passages (in green).
  • Figure 3: Plot showing the variation of ReFIT performance (R@100) with the number of distillation updates $n$ (where $\Delta$ is % increase in latency on CPU).
  • Figure 4: Plot showing the variation of ReFIT performance (R@100) with the number of reranked passages $K$ used for distillation supervision.