Table of Contents
Fetching ...

Relevance to Utility: Process-Supervised Rewrite for RAG

Jaeyoung Kim, Jongho Kim, Seung-won Hwang, Seoho Song, Young-In Song

TL;DR

This paper tackles the gap between retrieval relevance and generative utility in RAG by treating document rewriting as an integral part of the reasoning process. It introduces R2U, which uses a joint rewrite–answer perspective, scaled process supervision, and soft utility-based labeling to distill a utility-aligned rewriter. Empirical results across multiple QA benchmarks show that R2U consistently improves performance over strong baselines, including in multi-hop and web-scale settings, and generalizes to diverse generators. The work demonstrates that aligning rewriting with downstream reasoning via joint traces and soft utility signals yields substantial gains with relatively compact models, offering a practical approach for robust open-domain QA with RAG.

Abstract

Retrieval-augmented generation systems often suffer from a gap between optimizing retrieval relevance and generative utility. With such a gap, retrieved documents may be topically relevant but still lack the content needed for effective reasoning during generation. While existing bridge modules attempt to rewrite the retrieved text for better generation, we show how they fail by not capturing "document utility". In this work, we propose R2U, with a key distinction of approximating true utility through joint observation of rewriting and answering in the reasoning process. To distill, R2U scale such supervision to enhance reliability in distillation. We further construct utility-improvement supervision by measuring the generator's gain of the answer under the rewritten context, yielding signals for fine-tuning and preference optimization. We evaluate our method across multiple open-domain question-answering benchmarks. The empirical results demonstrate consistent improvements over strong bridging baselines

Relevance to Utility: Process-Supervised Rewrite for RAG

TL;DR

This paper tackles the gap between retrieval relevance and generative utility in RAG by treating document rewriting as an integral part of the reasoning process. It introduces R2U, which uses a joint rewrite–answer perspective, scaled process supervision, and soft utility-based labeling to distill a utility-aligned rewriter. Empirical results across multiple QA benchmarks show that R2U consistently improves performance over strong baselines, including in multi-hop and web-scale settings, and generalizes to diverse generators. The work demonstrates that aligning rewriting with downstream reasoning via joint traces and soft utility signals yields substantial gains with relatively compact models, offering a practical approach for robust open-domain QA with RAG.

Abstract

Retrieval-augmented generation systems often suffer from a gap between optimizing retrieval relevance and generative utility. With such a gap, retrieved documents may be topically relevant but still lack the content needed for effective reasoning during generation. While existing bridge modules attempt to rewrite the retrieved text for better generation, we show how they fail by not capturing "document utility". In this work, we propose R2U, with a key distinction of approximating true utility through joint observation of rewriting and answering in the reasoning process. To distill, R2U scale such supervision to enhance reliability in distillation. We further construct utility-improvement supervision by measuring the generator's gain of the answer under the rewritten context, yielding signals for fine-tuning and preference optimization. We evaluate our method across multiple open-domain question-answering benchmarks. The empirical results demonstrate consistent improvements over strong bridging baselines

Paper Structure

This paper contains 42 sections, 7 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Accuracy gap between MS MARCO and CRAG across different bridging models. All models take the top-10 retrieved documents as input, rewrite them, and pass the results to the generator.
  • Figure 2: Overview of R2U. (a) Scaled Process Supervision: LLM rewrites each document while conditioning on the remaining documents, yielding $(a,d')$ pairs suitable for distillation. (b) Utility as Soft Label: Rather than relying on hard binary signals, we evaluate $d'_i$ by measuring their soft utility improvement, and then divide $D$ into win and lose for training rewriter via standard SFT and DPO.
  • Figure 3: Comparison of ACC across various query types in CRAG. The value in parentheses indicates the difference relative to Naive RAG.
  • Figure 4: Comparison of average F1 scores across various model sizes using Llama (filled markers) and Qwen (hollow markers).
  • Figure 5: Distribution of utility improvement ($\Delta \ell$) for win and lose rewrites.