Table of Contents
Fetching ...

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen

TL;DR

This work tackles the misalignment between retrievers and LLM readers in retrieval-augmented generation by introducing DPA-RAG, a universal framework that achieves dual preference alignment. It combines Preference Knowledge Construction with five query augmentation strategies, a multi-grained reranker-LLM alignment (point-wise, pair-wise, contrastive) using MGDA-UB, and an LLM Self-Alignment pre-aligned stage before SFT. Empirical results on four knowledge-intensive QA benchmarks show that DPA-RAG consistently outperforms strong baselines and ABlation studies confirm the importance of each component and training strategy. The approach provides actionable guidance for building reliable RAG systems with improved knowledge alignment and reduced hallucination across diverse readers.

Abstract

Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in developing a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

TL;DR

This work tackles the misalignment between retrievers and LLM readers in retrieval-augmented generation by introducing DPA-RAG, a universal framework that achieves dual preference alignment. It combines Preference Knowledge Construction with five query augmentation strategies, a multi-grained reranker-LLM alignment (point-wise, pair-wise, contrastive) using MGDA-UB, and an LLM Self-Alignment pre-aligned stage before SFT. Empirical results on four knowledge-intensive QA benchmarks show that DPA-RAG consistently outperforms strong baselines and ABlation studies confirm the importance of each component and training strategy. The approach provides actionable guidance for building reliable RAG systems with improved knowledge alignment and reduced hallucination across diverse readers.

Abstract

Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in developing a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.

Paper Structure

This paper contains 39 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The results for GPT-3.5 comparing direct responses and answers referencing different retrieved documents (Grounding, 1st, 10th, 50th, 100th) on three QA benchmarks.
  • Figure 2: The overall framework of DPA-RAG. The upper part shows the pipeline for preference knowledge construction. The middle part displays the task format for dual preference alignment. The bottom part illustrates the inference process of DPA-RAG.
  • Figure 3: The scaling analysis of different parameter scales for HQA (left) and TQA (right).
  • Figure 4: The comparision experiment of preference alignment on NQ, TQA.
  • Figure 5: The left figure illustrates the visualization of different data complexity and diversity on NQ. The right figure shows performance of different training strategies on NQ.
  • ...and 1 more figures