Table of Contents
Fetching ...

VERA: Validation and Enhancement for Retrieval Augmented systems

Nitin Aravind Birur, Tanay Baswa, Divyanshu Kumar, Jatan Loya, Sahil Agarwal, Prashanth Harshangi

TL;DR

VERA addresses hallucinations and misalignment in retrieval-augmented generation by introducing an evaluator-cum-enhancer LLM that first refines retrieved context and then polishes the LLM's response. The system hinges on four fine-grained stages: Retrieval Requirement check, Retrieval Quality Evaluation and Correction, Response Relevancy Evaluation and Correction, and Response Adherence Evaluation and Correction, all guided by few-shot prompts. Across SQuAD-2.0, DROP, and real-world documents, VERA yields significant improvements in context relevance, response relevance, and adherence for both small and large LLMs, while reducing information loss and irrelevancies. This approach demonstrates practical gains in accuracy and reliability for knowledge-intensive tasks in RAG, offering a scalable framework to mitigate hallucinations and improve grounding in generated content.

Abstract

Large language models (LLMs) exhibit remarkable capabilities but often produce inaccurate responses, as they rely solely on their embedded knowledge. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating an external information retrieval system, supplying additional context along with the query to mitigate inaccuracies for a particular context. However, accuracy issues still remain, as the model may rely on irrelevant documents or extrapolate incorrectly from its training knowledge. To assess and improve the performance of both the retrieval system and the LLM in a RAG framework, we propose \textbf{VERA} (\textbf{V}alidation and \textbf{E}nhancement for \textbf{R}etrieval \textbf{A}ugmented systems), a system designed to: 1) Evaluate and enhance the retrieved context before response generation, and 2) Evaluate and refine the LLM-generated response to ensure precision and minimize errors. VERA employs an evaluator-cum-enhancer LLM that first checks if external retrieval is necessary, evaluates the relevance and redundancy of the retrieved context, and refines it to eliminate non-essential information. Post-response generation, VERA splits the response into atomic statements, assesses their relevance to the query, and ensures adherence to the context. Our experiments demonstrate VERA's remarkable efficacy not only in improving the performance of smaller open-source models, but also larger state-of-the art models. These enhancements underscore VERA's potential to produce accurate and relevant responses, advancing the state-of-the-art in retrieval-augmented language modeling. VERA's robust methodology, combining multiple evaluation and refinement steps, effectively mitigates hallucinations and improves retrieval and response processes, making it a valuable tool for applications demanding high accuracy and reliability in information generation. .

VERA: Validation and Enhancement for Retrieval Augmented systems

TL;DR

VERA addresses hallucinations and misalignment in retrieval-augmented generation by introducing an evaluator-cum-enhancer LLM that first refines retrieved context and then polishes the LLM's response. The system hinges on four fine-grained stages: Retrieval Requirement check, Retrieval Quality Evaluation and Correction, Response Relevancy Evaluation and Correction, and Response Adherence Evaluation and Correction, all guided by few-shot prompts. Across SQuAD-2.0, DROP, and real-world documents, VERA yields significant improvements in context relevance, response relevance, and adherence for both small and large LLMs, while reducing information loss and irrelevancies. This approach demonstrates practical gains in accuracy and reliability for knowledge-intensive tasks in RAG, offering a scalable framework to mitigate hallucinations and improve grounding in generated content.

Abstract

Large language models (LLMs) exhibit remarkable capabilities but often produce inaccurate responses, as they rely solely on their embedded knowledge. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating an external information retrieval system, supplying additional context along with the query to mitigate inaccuracies for a particular context. However, accuracy issues still remain, as the model may rely on irrelevant documents or extrapolate incorrectly from its training knowledge. To assess and improve the performance of both the retrieval system and the LLM in a RAG framework, we propose \textbf{VERA} (\textbf{V}alidation and \textbf{E}nhancement for \textbf{R}etrieval \textbf{A}ugmented systems), a system designed to: 1) Evaluate and enhance the retrieved context before response generation, and 2) Evaluate and refine the LLM-generated response to ensure precision and minimize errors. VERA employs an evaluator-cum-enhancer LLM that first checks if external retrieval is necessary, evaluates the relevance and redundancy of the retrieved context, and refines it to eliminate non-essential information. Post-response generation, VERA splits the response into atomic statements, assesses their relevance to the query, and ensures adherence to the context. Our experiments demonstrate VERA's remarkable efficacy not only in improving the performance of smaller open-source models, but also larger state-of-the art models. These enhancements underscore VERA's potential to produce accurate and relevant responses, advancing the state-of-the-art in retrieval-augmented language modeling. VERA's robust methodology, combining multiple evaluation and refinement steps, effectively mitigates hallucinations and improves retrieval and response processes, making it a valuable tool for applications demanding high accuracy and reliability in information generation. .
Paper Structure (20 sections, 3 equations, 8 figures, 3 tables, 4 algorithms)

This paper contains 20 sections, 3 equations, 8 figures, 3 tables, 4 algorithms.

Figures (8)

  • Figure 1: An overview of VERA
  • Figure 2: An overview of methodology of VERA
  • Figure 3: Retrieval Requirement Check
  • Figure 4: Retrieval Quality Evaluation and Correction
  • Figure 5: Response Relevancy Evaluation and Correction
  • ...and 3 more figures