Table of Contents
Fetching ...

QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation

Amin Bigdeli, Radin Hamidi Rad, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri

TL;DR

The paper addresses the lack of a reusable, reproducible framework for LLM-based query reformulation in IR. It introduces QueryGym, a modular Python toolkit that standardizes reformulation methods, decouples from retrieval backends, and centralizes prompt engineering and configuration. Key contributions include a unified Reformulation Framework, a retrieval-agnostic interface with Pyserini/PyTerrier wrappers, a versioned Prompt Bank, and built-in support for BEIR and MS MARCO benchmarks. The framework enables rapid prototyping, fair multi-method benchmarking, and scalable deployment, making it practical for research and deployment in real IR systems.

Abstract

We present QueryGym, a lightweight, extensible Python toolkit that supports large language model (LLM)-based query reformulation. This is an important tool development since recent work on llm-based query reformulation has shown notable increase in retrieval effectiveness. However, while different authors have sporadically shared the implementation of their methods, there is no unified toolkit that provides a consistent implementation of such methods, which hinders fair comparison, rapid experimentation, consistent benchmarking and reliable deployment. QueryGym addresses this gap by providing a unified framework for implementing, executing, and comparing llm-based reformulation methods. The toolkit offers: (1) a Python API for applying diverse LLM-based methods, (2) a retrieval-agnostic interface supporting integration with backends such as Pyserini and PyTerrier, (3) a centralized prompt management system with versioning and metadata tracking, (4) built-in support for benchmarks like BEIR and MS MARCO, and (5) a completely open-source extensible implementation available to all researchers. QueryGym is publicly available at https://github.com/radinhamidi/QueryGym.

QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation

TL;DR

The paper addresses the lack of a reusable, reproducible framework for LLM-based query reformulation in IR. It introduces QueryGym, a modular Python toolkit that standardizes reformulation methods, decouples from retrieval backends, and centralizes prompt engineering and configuration. Key contributions include a unified Reformulation Framework, a retrieval-agnostic interface with Pyserini/PyTerrier wrappers, a versioned Prompt Bank, and built-in support for BEIR and MS MARCO benchmarks. The framework enables rapid prototyping, fair multi-method benchmarking, and scalable deployment, making it practical for research and deployment in real IR systems.

Abstract

We present QueryGym, a lightweight, extensible Python toolkit that supports large language model (LLM)-based query reformulation. This is an important tool development since recent work on llm-based query reformulation has shown notable increase in retrieval effectiveness. However, while different authors have sporadically shared the implementation of their methods, there is no unified toolkit that provides a consistent implementation of such methods, which hinders fair comparison, rapid experimentation, consistent benchmarking and reliable deployment. QueryGym addresses this gap by providing a unified framework for implementing, executing, and comparing llm-based reformulation methods. The toolkit offers: (1) a Python API for applying diverse LLM-based methods, (2) a retrieval-agnostic interface supporting integration with backends such as Pyserini and PyTerrier, (3) a centralized prompt management system with versioning and metadata tracking, (4) built-in support for benchmarks like BEIR and MS MARCO, and (5) a completely open-source extensible implementation available to all researchers. QueryGym is publicly available at https://github.com/radinhamidi/QueryGym.

Paper Structure

This paper contains 4 sections, 4 figures.

Figures (4)

  • Figure 1: Inheritance hierarchy for the main classes in the QueryGym Python package.
  • Figure 2: Example usage of QueryGym for query reformulation.
  • Figure 3: Integrated pipeline for Pyserini retrieval and QueryGym reformulation.
  • Figure 4: Multi-method benchmarking pipeline across datasets under controlled conditions.