ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Jiyoon Myung; Jungki Son; Kyungro Lee; Jihyeon Park; Joohyung Han

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Jiyoon Myung, Jungki Son, Kyungro Lee, Jihyeon Park, Joohyung Han

TL;DR

This work introduces a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval.

Abstract

Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents. Query rewriting has been proposed to bridge this gap, improving retrieval by reformulating user queries into semantically equivalent forms. However, most existing approaches overlook the stylistic characteristics of target documents-their domain-specific phrasing, tone, and structure-which are crucial for matching real-world data distributions. We introduce a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval. The resulting corpus of (original, rewritten) query pairs enables the training of rewriter models that are explicitly aware of document style and retrieval feedback. This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems in real-world, domain-specific contexts.

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

TL;DR

Abstract

Paper Structure (18 sections, 1 figure, 2 tables)

This paper contains 18 sections, 1 figure, 2 tables.

Introduction
Related Work
Query Rewriting for Information Retrieval.
Feedback-Driven Query Reformulation.
Existing Query Rewriting Datasets.
Methodology
Initial Retrieval
LLM-Guided Rewriting
Verification via Re-Retrieval
Dataset Assembly
Experiments
Setup
Dataset Construction
Qualitative Examples.
Few-Shot Validation with Constructed Dataset
...and 3 more sections

Figures (1)

Figure 1: Overview of the retrieval feedback–driven dataset generation framework. Missed queries are rewritten via LLMs to match the style of the correct documents and validated through re-retrieval before being assembled into a style-aware corpus.

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

TL;DR

Abstract

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Authors

TL;DR

Abstract

Table of Contents

Figures (1)