ReFIT: Relevance Feedback from a Reranker during Inference
Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi
TL;DR
This work addresses the shortcoming of retrieve-and-rerank systems where Recall@K cannot be improved by reranking alone. It introduces ReFIT, an inference-time mechanism that distills the cross-encoder reranker’s relevance signals into a refreshed query representation for a second retrieval, achieving higher Recall@K while maintaining comparable latency. The approach is architecture- and modality-agnostic, showing gains across English domains, multilingual and cross-lingual settings, and multimodal retrieval, and it outperforms training-time distillation baselines and TouR in many scenarios. ReFIT enables practical, scalable improvements in retrieval effectiveness and opens avenues for integrating relevance feedback from larger models at inference time, including future work with LLM-based signals and interpretability of query updates.
Abstract
Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.
