Table of Contents
Fetching ...

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis

TL;DR

The paper addresses scalable retrieval enhancements by combining generative relevance feedback (Gen-QR and Gen-PRF) with graph-based adaptive re-ranking (GAR) in the context of TREC DL 2023. It evaluates zero-shot Gen-QR/Gen-PRF over BM25 and SPLADE first-stage retrievers and applies GAR on a BM25 corpus graph G = (V,E) with budget 5000 and 32 nearest neighbours, scored by a cross-encoder S such as monoELECTRA. Key findings show that Gen-PRF with GAR yields the strongest P@10 and nDCG@10, though SPLADE often achieves higher recall and MAP; with large budgets, a lexical first-stage model can approximate the performance of a learned retriever, as evidenced by increasing RBO correlations up to around 0.80. The results demonstrate the generalizability of zero-shot generative expansions to new test sets and reveal graph-based re-ranking as a viable pathway to reduce reliance on expensive first-stage models for practical, scalable retrieval systems.

Abstract

This paper describes our participation in the TREC 2023 Deep Learning Track. We submitted runs that apply generative relevance feedback from a large language model in both a zero-shot and pseudo-relevance feedback setting over two sparse retrieval approaches, namely BM25 and SPLADE. We couple this first stage with adaptive re-ranking over a BM25 corpus graph scored using a monoELECTRA cross-encoder. We investigate the efficacy of these generative approaches for different query types in first-stage retrieval. In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline. We find some performance gains from the application of generative query reformulation. However, our strongest run in terms of P@10 and nDCG@10 applied both adaptive re-ranking and generative pseudo-relevance feedback, namely uogtr_b_grf_e_gb.

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

TL;DR

The paper addresses scalable retrieval enhancements by combining generative relevance feedback (Gen-QR and Gen-PRF) with graph-based adaptive re-ranking (GAR) in the context of TREC DL 2023. It evaluates zero-shot Gen-QR/Gen-PRF over BM25 and SPLADE first-stage retrievers and applies GAR on a BM25 corpus graph G = (V,E) with budget 5000 and 32 nearest neighbours, scored by a cross-encoder S such as monoELECTRA. Key findings show that Gen-PRF with GAR yields the strongest P@10 and nDCG@10, though SPLADE often achieves higher recall and MAP; with large budgets, a lexical first-stage model can approximate the performance of a learned retriever, as evidenced by increasing RBO correlations up to around 0.80. The results demonstrate the generalizability of zero-shot generative expansions to new test sets and reveal graph-based re-ranking as a viable pathway to reduce reliance on expensive first-stage models for practical, scalable retrieval systems.

Abstract

This paper describes our participation in the TREC 2023 Deep Learning Track. We submitted runs that apply generative relevance feedback from a large language model in both a zero-shot and pseudo-relevance feedback setting over two sparse retrieval approaches, namely BM25 and SPLADE. We couple this first stage with adaptive re-ranking over a BM25 corpus graph scored using a monoELECTRA cross-encoder. We investigate the efficacy of these generative approaches for different query types in first-stage retrieval. In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline. We find some performance gains from the application of generative query reformulation. However, our strongest run in terms of P@10 and nDCG@10 applied both adaptive re-ranking and generative pseudo-relevance feedback, namely uogtr_b_grf_e_gb.
Paper Structure (19 sections, 5 equations, 2 figures, 1 table)

This paper contains 19 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Results for each main query type comparing BM25 >> monoELECTRA, QR(FLAN-T5) >> BM25 >> monoELECTRA, BM25 >> PRF$_\textit{Top-P}$(FLAN-T5) monoELECTRA and SPLADE >> monoELECTRA.
  • Figure 2: Contrasting adaptive re-ranking over BM25 and SPLADE first stages with full re-ranking at different budgets on TREC Deep Learning 2022 queries. Rankings are truncated to the top 100 results. Post-ranking, duplicates were added to the rankings to follow the current TREC procedure on MS MARCO-v2. Error bands are omitted from Figure (b) for clarity.