Table of Contents
Fetching ...

Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback

Yu Bai, Yukai Miao, Li Chen, Dawei Wang, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

TL;DR

Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics, highlighting Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.

Abstract

RAG systems face limitations when semantic relevance alone does not guarantee improved generation quality. This issue becomes particularly evident due to the sensitivity of large language models (LLMs) to the ordering of few-shot prompts, which can affect model performance. To address this challenge, aligning LLM outputs with human preferences using structured feedback, such as options to copy, regenerate, or dislike, offers a promising method for improvement. This feedback is applied to the entire list of inputs rather than giving specific ratings for individual documents, making it a Listwide Labels Learning-to-Rank task. To address this task, we propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences. Pistis-RAG effectively utilizes human feedback, enhancing content ranking and generation quality. To validate our framework, we use public datasets to simulate human feedback, allowing us to evaluate and refine our method effectively. Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics. These results highlight Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.

Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback

TL;DR

Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics, highlighting Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.

Abstract

RAG systems face limitations when semantic relevance alone does not guarantee improved generation quality. This issue becomes particularly evident due to the sensitivity of large language models (LLMs) to the ordering of few-shot prompts, which can affect model performance. To address this challenge, aligning LLM outputs with human preferences using structured feedback, such as options to copy, regenerate, or dislike, offers a promising method for improvement. This feedback is applied to the entire list of inputs rather than giving specific ratings for individual documents, making it a Listwide Labels Learning-to-Rank task. To address this task, we propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences. Pistis-RAG effectively utilizes human feedback, enhancing content ranking and generation quality. To validate our framework, we use public datasets to simulate human feedback, allowing us to evaluate and refine our method effectively. Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics. These results highlight Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.
Paper Structure (38 sections, 2 equations, 7 figures, 3 tables)

This paper contains 38 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison between traditional Information Retrieval systems (such as Search Engines and Recommendation Systems) and Retrieval-Augmented Generation systems. The illustration highlights the differences in ranking processes, with RAG systems lacking a distinct ranking phase for alignment.
  • Figure 2: Pipeline of Pistis-RAG. The light blue sections represent the offline alignment process, while the light green sections correspond to the online request handling process. By collecting user copy, regeneration, and dislike actions, this feedback is integrated into the system's long-term memory as training data for Pistis-RAG alignment. This allows the RAG system to optimize search results during online user requests, aligning the retrieved content more closely with the LLM and user preferences, thereby generating content that better meets user expectations.
  • Figure 3: Brief of RAG Framework: from the initial bi-encoder encoding to mixed retrieval, and cross-encoder re-ranking.
  • Figure 4: Simulating Feedback: Comparison of online human feedback versus simulated feedback. Correct answers ($Y_{\text{correct}}$) are represented by text copying, incorrect answers ($Y_{\text{incorrect}}$) reflect regeneration, and no answers ($Y_{\text{no\_answer}}$) indicate negative user feedback.
  • Figure 5: RegEx patterns used in RegEx-Based Extraction for extracting feedback.
  • ...and 2 more figures