Table of Contents
Fetching ...

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion

Minghan Li, Ercong Nie, Siqi Zhao, Tongna Chen, Huiping Huang, Guodong Zhou

TL;DR

This paper tackles the instability and non-scalability of LLM-driven query expansion under domain shift by introducing a fully automated, domain-adaptive QE framework. It constructs large in-domain exemplar pools via a BM25–MonoT5 pipeline, then selects diverse demonstrations with a simple clustering strategy, enabling training-free in-context learning. To exploit model complementarities, it proposes a two-LLM expansion ensemble whose outputs are synthesized by an LLM refinement module, eliminating the need for additional retrieval passes. Across DL20, DBPedia-Entity, and SciFact, the approach yields robust gains over lexical baselines and single-LLM prompts, with the refined two-LLM ensemble providing the strongest, often significant improvements, and extends benefit to dense retrieval setups as well.

Abstract

Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A training-free cluster-based strategy selects diverse demonstrations, yielding strong and stable in-context QE without supervision. To further exploit model complementarity, we introduce a two-LLM ensemble in which two heterogeneous LLMs independently generate expansions and a refinement LLM consolidates them into one coherent expansion. Across TREC DL20, DBPedia, and SciFact, the refined ensemble delivers consistent and statistically significant gains over BM25, Rocchio, zero-shot, and fixed few-shot baselines. The framework offers a reproducible testbed for exemplar selection and multi-LLM generation, and a practical, label-free solution for real-world QE.

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion

TL;DR

This paper tackles the instability and non-scalability of LLM-driven query expansion under domain shift by introducing a fully automated, domain-adaptive QE framework. It constructs large in-domain exemplar pools via a BM25–MonoT5 pipeline, then selects diverse demonstrations with a simple clustering strategy, enabling training-free in-context learning. To exploit model complementarities, it proposes a two-LLM expansion ensemble whose outputs are synthesized by an LLM refinement module, eliminating the need for additional retrieval passes. Across DL20, DBPedia-Entity, and SciFact, the approach yields robust gains over lexical baselines and single-LLM prompts, with the refined two-LLM ensemble providing the strongest, often significant improvements, and extends benefit to dense retrieval setups as well.

Abstract

Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A training-free cluster-based strategy selects diverse demonstrations, yielding strong and stable in-context QE without supervision. To further exploit model complementarity, we introduce a two-LLM ensemble in which two heterogeneous LLMs independently generate expansions and a refinement LLM consolidates them into one coherent expansion. Across TREC DL20, DBPedia, and SciFact, the refined ensemble delivers consistent and statistically significant gains over BM25, Rocchio, zero-shot, and fixed few-shot baselines. The framework offers a reproducible testbed for exemplar selection and multi-LLM generation, and a practical, label-free solution for real-world QE.
Paper Structure (24 sections, 3 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of our automated pipeline for constructing domain-adaptive few-shot candidate pools, selecting cluster-based demonstrations, and performing two-LLM query expansion with LLM refinement.
  • Figure 2: Illustration of the two prompts used in our framework: (i) the expansion-generation prompt containing system instruction, four exemplar query–passage pairs, and the test query; and (ii) the refinement prompt that consolidates two candidate expansions into one coherent expansion.
  • Figure 3: Ablation on TREC DL20: comparison of FewShot4-Fixed vs. in-domain Cluster-ICL QE under two-LLM ensembles.
  • Figure 4: Comparison of single-LLM vs. two-LLM ensemble under the Cluster-ICL QE setting. Ensemble-Refine provides the strongest performance across both early-precision (NDCG@10) and deep-recall metrics.