TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

Hang Li; Chuting Yu; Ahmed Mourad; Bevan Koopman; Guido Zuccon

TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

Hang Li, Chuting Yu, Ahmed Mourad, Bevan Koopman, Guido Zuccon

TL;DR

The paper addresses the challenge of applying pseudo-relevance feedback to dense retrievers in resource-constrained environments by proposing TPRF, a compact transformer-based PRF that operates on dense feedback vectors rather than text. TPRF fuses the initial query with top-k dense passage representations to produce a refined query via a vanilla transformer encoder, enabling a second retrieval pass with reduced memory and latency. Compared against ANCE and its PRF variants, TPRF trades some effectiveness for substantial gains in memory footprint and CPU speed, with the best configurations yielding sub-second latency on CPUs and model sizes below 600 MB. The approach is scalable to large PRF depths and is suitable for deployment on embedded systems and affordable cloud instances, offering a practical path to efficient and effective retrieval with PRF.

Abstract

This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e.g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present. For this, we propose a transformer-based PRF method (TPRF), which has a much smaller memory footprint and faster inference time compared to other deep language models that employ PRF mechanisms, with a marginal effectiveness loss. TPRF learns how to effectively combine the relevance feedback signals from dense passage representations. Specifically, TPRF provides a mechanism for modelling relationships and weights between the query and the relevance feedback signals. The method is agnostic to the specific dense representation used and thus can be generally applied to any dense retriever.

TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

TL;DR

Abstract

Paper Structure (14 sections, 1 equation, 2 figures, 1 table)

This paper contains 14 sections, 1 equation, 2 figures, 1 table.

Introduction
Methods
Transformer-based PRF
Training TPRF with Hard Negative Sampling
Theoretical Advantages of TPRF
Experimental Setup
Datasets and Evaluation Metrics
Models
TPRF Implementation and Training
Results
Overall Effectiveness
Trade-off between Effectiveness, Model Size and Query Latency
Query Latency and Scalability to PRF Depth
Conclusion

Figures (2)

Figure 1: Relationship between effectiveness (measured as nDCG@10 and R@1000), query latency and model size for PRF methods, across datasets. For ANCE-TPRF, we display different configurations (w.r.t. number of layers and attention heads).
Figure 2: Query latency for the PRF models as a function of the PRF depth $k$.

TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

TL;DR

Abstract

TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (2)