Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

Andrew Parry; Thomas Jaenich; Sean MacAvaney; Iadh Ounis

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis

TL;DR

The paper addresses scalable retrieval enhancements by combining generative relevance feedback (Gen-QR and Gen-PRF) with graph-based adaptive re-ranking (GAR) in the context of TREC DL 2023. It evaluates zero-shot Gen-QR/Gen-PRF over BM25 and SPLADE first-stage retrievers and applies GAR on a BM25 corpus graph G = (V,E) with budget 5000 and 32 nearest neighbours, scored by a cross-encoder S such as monoELECTRA. Key findings show that Gen-PRF with GAR yields the strongest P@10 and nDCG@10, though SPLADE often achieves higher recall and MAP; with large budgets, a lexical first-stage model can approximate the performance of a learned retriever, as evidenced by increasing RBO correlations up to around 0.80. The results demonstrate the generalizability of zero-shot generative expansions to new test sets and reveal graph-based re-ranking as a viable pathway to reduce reliance on expensive first-stage models for practical, scalable retrieval systems.

Abstract

This paper describes our participation in the TREC 2023 Deep Learning Track. We submitted runs that apply generative relevance feedback from a large language model in both a zero-shot and pseudo-relevance feedback setting over two sparse retrieval approaches, namely BM25 and SPLADE. We couple this first stage with adaptive re-ranking over a BM25 corpus graph scored using a monoELECTRA cross-encoder. We investigate the efficacy of these generative approaches for different query types in first-stage retrieval. In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline. We find some performance gains from the application of generative query reformulation. However, our strongest run in terms of P@10 and nDCG@10 applied both adaptive re-ranking and generative pseudo-relevance feedback, namely uogtr_b_grf_e_gb.

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

TL;DR

Abstract

Paper Structure (19 sections, 5 equations, 2 figures, 1 table)

This paper contains 19 sections, 5 equations, 2 figures, 1 table.

Introduction
Pyterrier Retrieval Pipelines
Methods
Generative Query Reformulation & Pseudo-Relevance Feedback
Graph-based Adaptive Re-Ranking (GAR)
Experimental Setup
Retrieval:
Query Expansion:
Submitted Runs
Baseline Runs
Submitted Group Runs
Additional Runs
Results & Analysis
Generative methods versus SPLADE
GAR is a stronger standalone method
...and 4 more sections

Figures (2)

Figure 1: Results for each main query type comparing BM25 >> monoELECTRA, QR(FLAN-T5) >> BM25 >> monoELECTRA, BM25 >> PRF$_\textit{Top-P}$(FLAN-T5) monoELECTRA and SPLADE >> monoELECTRA.
Figure 2: Contrasting adaptive re-ranking over BM25 and SPLADE first stages with full re-ranking at different budgets on TREC Deep Learning 2022 queries. Rankings are truncated to the top 100 results. Post-ranking, duplicates were added to the rankings to follow the current TREC procedure on MS MARCO-v2. Error bands are omitted from Figure (b) for clarity.

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

TL;DR

Abstract

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

Authors

TL;DR

Abstract

Table of Contents

Figures (2)