Exploring the Best Practices of Query Expansion with Large Language Models

Le Zhang; Yihong Wu; Qian Yang; Jian-Yun Nie

Exploring the Best Practices of Query Expansion with Large Language Models

Le Zhang, Yihong Wu, Qian Yang, Jian-Yun Nie

TL;DR

This work tackles the challenge of improving information retrieval through effective query expansion by leveraging large language models. It introduces MuGI, a training-free framework that generates multiple pseudo-references with LLMs and integrates them with queries via adaptive reweighting, contextual pooling, and pseudo relevance feedback calibration, applicable to both BM25 and dense bi-encoders. Empirical results show MuGI consistently enhances performance across in-domain and out-of-domain datasets, enabling small dense models (as few as 23M parameters) to rival larger baselines and achieving significant gains on benchmarks like TREC DL and BEIR. The findings highlight practical best practices for query expansion, including the use of multiple references, adaptive weighting, and a calibration step, while acknowledging inference-time costs and suggesting avenues for integration with Retrieval-Augmented Generation in future work.

Abstract

Large Language Models (LLMs) are foundational in language technologies, particularly in information retrieval (IR). Previous studies have utilized LLMs for query expansion, achieving notable improvements in IR. In this paper, we thoroughly explore the best practice of leveraging LLMs for query expansion. To this end, we introduce a training-free, straightforward yet effective framework called Multi-Text Generation Integration (\textsc{MuGI}). It leverages LLMs to generate multiple pseudo-references, integrating them with queries to enhance both sparse and dense retrievers. Our empirical findings reveal that: (1) Increasing the number of samples from LLMs benefits IR systems; (2) A balance between the query and pseudo-documents, and an effective integration strategy, is critical for high performance; (3) Contextual information from LLMs is essential, even boost a 23M model to outperform a 7B baseline model; (4) Pseudo relevance feedback can further calibrate queries for improved performance; and (5) Query expansion is widely applicable and versatile, consistently enhancing models ranging from 23M to 7B parameters. Our code and all generated references are made available at \url{https://github.com/lezhang7/Retrieval_MuGI}

Exploring the Best Practices of Query Expansion with Large Language Models

TL;DR

Abstract

Paper Structure (33 sections, 6 equations, 7 figures, 4 tables)

This paper contains 33 sections, 6 equations, 7 figures, 4 tables.

Introduction
Related Work
Information Retrieval
LLMs for IR
Method
Preliminaries
Non-parametric Lexical-based Methods
Neural Dense Retrieval Methods
Multi-Text Generation Integration
MuGI for BM25
MuGI for Dense Retriever
Integration
Calibration
MuGI Pipeline
Experiments
...and 18 more sections

Figures (7)

Figure 1: Zero-Shot Prompting for Relevant Passage Generation: It emphasizes generating contextually relevant content to enhance background knowledge density for multiple-text integration.
Figure 2: Method overview of MuGI. Left part is initial retrieval using BM25 for initial retrieval, right part indicates re-rank output from first stage using a dense retriever.
Figure 3: BM25 + MuGI Reweighting Strategy Results (nDCG@10) on average scores of TREC DL + BEIR. The left panel illustrates the constant repetition of the query, while the right panel displays our adaptive reweighting strategy with various $\beta$ values. The Y-axis represents the number of pseudo-references used.
Figure 4: BM25 + MuGI over various LLMs with different reweight strategy with 5 References Results (nDCG@10) .
Figure 5: Calibration Ablation $\alpha=0.2$ (nDCG@10). (Top) In domain TREC DL evaluation; (Bottom) BEIR OOD evaluation. E5-M is E5-Mistral-instruct, BGE is BGE-Large-EN-v1.5, MLM is all-MiniLM-L6-v2, Ember is Ember-v1.
...and 2 more figures

Exploring the Best Practices of Query Expansion with Large Language Models

TL;DR

Abstract

Exploring the Best Practices of Query Expansion with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)