Table of Contents
Fetching ...

MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

Yujing Wang, Hainan Zhang, Liang Pang, Binghui Guo, Hongwei Zheng, Zhiming Zheng

TL;DR

MaFeRw tackles the challenge of rewriting queries in retrieval-augmented generation by introducing multi-aspect dense rewards derived from the gold document, retrieved documents, and ground truth. It initializes a T5-base rewriter, trains reward models via a RLAIF-inspired approach, and optimizes with PPO to maximize a composite reward that includes ROUGE between rewritten and manual rewrites. Empirical results on two conversational RAG datasets show that MaFeRw improves generation metrics and retrieval quality while providing more stable training than baselines. The approach demonstrates transferability to multi-document tasks, suggesting practical impact for robust, scalable RAG systems.

Abstract

In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to utilize reinforcement learning with generation feedback to assist the rewriter, but these sparse rewards provide little guidance in most cases, leading to unstable training and generation results. We find that user's needs are also reflected in the gold document, retrieved documents and ground truth. Therefore, by feeding back these multi-aspect dense rewards to query rewriting, more stable and satisfactory responses can be achieved. In this paper, we propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results. Specifically, we first use manual data to train a T5 model for the rewriter initialization. Next, we design three metrics as reinforcement learning feedback: the similarity between the rewritten query and the gold document, the ranking metrics, and ROUGE between the generation and the ground truth. Inspired by RLAIF, we train three kinds of reward models for the above metrics to achieve more efficient training. Finally, we combine the scores of these reward models as feedback, and use PPO algorithm to explore the optimal query rewriting strategy. Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.

MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

TL;DR

MaFeRw tackles the challenge of rewriting queries in retrieval-augmented generation by introducing multi-aspect dense rewards derived from the gold document, retrieved documents, and ground truth. It initializes a T5-base rewriter, trains reward models via a RLAIF-inspired approach, and optimizes with PPO to maximize a composite reward that includes ROUGE between rewritten and manual rewrites. Empirical results on two conversational RAG datasets show that MaFeRw improves generation metrics and retrieval quality while providing more stable training than baselines. The approach demonstrates transferability to multi-document tasks, suggesting practical impact for robust, scalable RAG systems.

Abstract

In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to utilize reinforcement learning with generation feedback to assist the rewriter, but these sparse rewards provide little guidance in most cases, leading to unstable training and generation results. We find that user's needs are also reflected in the gold document, retrieved documents and ground truth. Therefore, by feeding back these multi-aspect dense rewards to query rewriting, more stable and satisfactory responses can be achieved. In this paper, we propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results. Specifically, we first use manual data to train a T5 model for the rewriter initialization. Next, we design three metrics as reinforcement learning feedback: the similarity between the rewritten query and the gold document, the ranking metrics, and ROUGE between the generation and the ground truth. Inspired by RLAIF, we train three kinds of reward models for the above metrics to achieve more efficient training. Finally, we combine the scores of these reward models as feedback, and use PPO algorithm to explore the optimal query rewriting strategy. Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
Paper Structure (32 sections, 8 equations, 4 figures, 4 tables)

This paper contains 32 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An example of MaFeRw serving RAG and the comparison with T5 rewriter. Green lines represent the inference process of rewriter and RAG, while Red lines indicate the three types of reward metrics feedback to MaFeRw.
  • Figure 2: The framework of MaFeRw. (a) Three feedback metrics are: the similarity between rewritten query and gold document, the ranking metric of similarity between ground truth and retrieved documents, and the ROUGE scores between generation and ground truth. Corresponding reward models are trained for these metrics. (b) When training the rewriter using PPO algorithm, the reward is composed of scores from three reward models and the rewritten ROUGE.
  • Figure 3: The changes on ROUGE-1 and MRR as training iterations increase when rewrites are applied to RAG.
  • Figure 4: The case study comparing with two baselines.