Table of Contents
Fetching ...

Revisiting Feedback Models for HyDE

Nour Jedidi, Jimmy Lin

TL;DR

The paper addresses improving HyDE-based PRF for BM25 by revisiting classic feedback models. It systematically evaluates Avg Vector, Rocchio, and RM3 when updating HyDE-generated feedback terms across 14 datasets, including MS MARCO and BEIR, with a term-selection step that filters corpus-wide terms before weighting. The findings show that Rocchio provides the strongest gains (up to 4.3 Recall@20 points), Avg Vector reduces noise compared to naive concatenation, and RM3 offers robustness on BEIR tasks, illustrating that traditional PRF methods remain valuable for enhancing LLM-based expansions. The work highlights the practical value of combining LLM-generated feedback with established weighting schemes and provides open-source code to facilitate future research and reproducibility, suggesting adaptive weighting as a promising direction ($\alpha$, $\beta$, $\lambda$).

Abstract

Recent approaches that leverage large language models (LLMs) for pseudo-relevance feedback (PRF) have generally not utilized well-established feedback models like Rocchio and RM3 when expanding queries for sparse retrievers like BM25. Instead, they often opt for a simple string concatenation of the query and LLM-generated expansion content. But is this optimal? To answer this question, we revisit and systematically evaluate traditional feedback models in the context of HyDE, a popular method that enriches query representations with LLM-generated hypothetical answer documents. Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms, providing a simple way to further enhance the accuracy of LLM-based PRF methods.

Revisiting Feedback Models for HyDE

TL;DR

The paper addresses improving HyDE-based PRF for BM25 by revisiting classic feedback models. It systematically evaluates Avg Vector, Rocchio, and RM3 when updating HyDE-generated feedback terms across 14 datasets, including MS MARCO and BEIR, with a term-selection step that filters corpus-wide terms before weighting. The findings show that Rocchio provides the strongest gains (up to 4.3 Recall@20 points), Avg Vector reduces noise compared to naive concatenation, and RM3 offers robustness on BEIR tasks, illustrating that traditional PRF methods remain valuable for enhancing LLM-based expansions. The work highlights the practical value of combining LLM-generated feedback with established weighting schemes and provides open-source code to facilitate future research and reproducibility, suggesting adaptive weighting as a promising direction (, , ).

Abstract

Recent approaches that leverage large language models (LLMs) for pseudo-relevance feedback (PRF) have generally not utilized well-established feedback models like Rocchio and RM3 when expanding queries for sparse retrievers like BM25. Instead, they often opt for a simple string concatenation of the query and LLM-generated expansion content. But is this optimal? To answer this question, we revisit and systematically evaluate traditional feedback models in the context of HyDE, a popular method that enriches query representations with LLM-generated hypothetical answer documents. Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms, providing a simple way to further enhance the accuracy of LLM-based PRF methods.

Paper Structure

This paper contains 8 sections, 4 equations, 1 table.