PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

Qingfei Zhao; Ruobing Wang; Yukuo Cen; Daren Zha; Shicheng Tan; Jie Tang

PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Jie Tang

TL;DR

PrefRAG tackles hallucination and knowledge gaps in LLMs by introducing a multi-source retrieval augmentation framework that is guided by preference-driven adaptive retrieval and reinforced by self-reflection. It enables orderly, local-first exploration of diverse sources (local and web) and switches sources only when needed, reducing exposure to unreliable content. A data-construction pipeline using Direct Preference Optimization (DPO) aligns source-selection decisions with high-quality retrieval outcomes, and extensive experiments across four QA datasets show significant gains over Vanilla RAG and multi-source baselines. The results demonstrate improved answer quality, retrieval efficiency, and controllability, highlighting PrefRAG's practical potential for reliable, preference-aware knowledge augmentation in real-world applications.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a reliable external knowledge augmentation technique to mitigate hallucination issues and parameterized knowledge limitations in Large Language Models (LLMs). Existing adaptive RAG (ARAG) systems excel at in-depth exploration within a single source but struggle to effectively and controllably explore different retrieval sources, as they fail to foresee their internal knowledge features. We develop a novel multi-source ARAG system, PrefRAG, which enhances RAG by enabling in-depth and controllable exploration of diverse retrieval sources through preference-driven adaptive retrieval and self-reflection. PrefRAG first fully explores controllable local sources in adaptive retrieval and supplements with the web when appropriate, ultimately selecting the optimal source for knowledge observation. Subsequently, PrefRAG feeds answer quality feedback into the retrieval process, optimizing it from the generation perspective to produce higher-quality responses. Extensive experiments confirm its superiority, high retrieval efficiency, and knowledge controllability. PrefRAG outperforms Vanilla RAG and the leading MS-ARAG by up to 25.6% and 13.9% respectively. Additionally, PrefRAG trained with DPO achieves higher performance. The code and data are available at https://github.com/QingFei1/PrefRAG.git.

PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

TL;DR

Abstract

PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)