TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval
Hang Li, Chuting Yu, Ahmed Mourad, Bevan Koopman, Guido Zuccon
TL;DR
The paper addresses the challenge of applying pseudo-relevance feedback to dense retrievers in resource-constrained environments by proposing TPRF, a compact transformer-based PRF that operates on dense feedback vectors rather than text. TPRF fuses the initial query with top-k dense passage representations to produce a refined query via a vanilla transformer encoder, enabling a second retrieval pass with reduced memory and latency. Compared against ANCE and its PRF variants, TPRF trades some effectiveness for substantial gains in memory footprint and CPU speed, with the best configurations yielding sub-second latency on CPUs and model sizes below 600 MB. The approach is scalable to large PRF depths and is suitable for deployment on embedded systems and affordable cloud instances, offering a practical path to efficient and effective retrieval with PRF.
Abstract
This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever.
