Reward-RAG: Enhancing RAG with Reward Driven Supervision
Thang Nguyen, Peter Chin, Yu-Wing Tai
TL;DR
Reward-RAG introduces a reward-driven supervision loop to improve retrieval in RAG by training a reward model with CriticGPT-generated feedback that mimics human preferences. The reward model guides domain-adaptive fine-tuning of the retrieval encoder, using synthetic $(q,d,r)$ data and hard-negative mining with an InfoNCE objective. Empirical results across open-domain and medical domain benchmarks show improved retrieval quality and generation relevance, with notable gains in PubMedQA and NQ, and strong ablations demonstrating the value of GPT-4o-based feedback and thoughtful prompting. This framework advances practical, scalable alignment of RAG systems to human preferences and domain-specific needs, reducing reliance on costly human annotations while enabling robust cross-domain performance.
Abstract
In this paper, we introduce Reward-RAG, a novel approach designed to enhance the Retrieval-Augmented Generation (RAG) model through Reward-Driven Supervision. Unlike previous RAG methodologies, which focus on training language models (LMs) to utilize external knowledge retrieved from external sources, our method adapts retrieval information to specific domains by employing CriticGPT to train a dedicated reward model. This reward model generates synthesized datasets for fine-tuning the RAG encoder, aligning its outputs more closely with human preferences. The versatility of our approach allows it to be effectively applied across various domains through domain-specific fine-tuning. We evaluate Reward-RAG on publicly available benchmarks from multiple domains, comparing it to state-of-the-art methods. Our experimental results demonstrate significant improvements in performance, highlighting the effectiveness of Reward-RAG in improving the relevance and quality of generated responses. These findings underscore the potential of integrating reward models with RAG to achieve superior outcomes in natural language generation tasks.
