Table of Contents
Fetching ...

QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance

Binita Saha, Utsha Saha, Muhammad Zubair Malik

TL;DR

The paper tackles domain-specific QA by introducing QuIM-RAG, a retrieval-augmented generation system that uses inverted question matching with quantized embedding prototypes to retrieve highly relevant document chunks. It formalizes QA on limited corpora with $A = \arg\max_{A' \in C} \text{Relevance}(A', Q)$ and employs $p_l = \arg\min_p \text{CosineSimilarity}(v_{ijl}, p)$ to build an efficient inverted index, enabling precise, source-backed responses. Implemented on Meta-LLaMA3-8B-instruct and trained on a 500+ page domain corpus from NDSU sites, the approach yields substantial improvements over traditional RAG as evidenced by BERTScore and RAGAS metrics. The work demonstrates that domain-focused data preparation and a tailored retrieval mechanism can significantly mitigate information dilution and hallucination, enhancing both accuracy and trust with explicit source links. The findings have practical impact for deploying reliable QA systems in specialized domains that demand up-to-date, verifiable information.

Abstract

This work presents a novel architecture for building Retrieval-Augmented Generation (RAG) systems to improve Question Answering (QA) tasks from a target corpus. Large Language Models (LLMs) have revolutionized the analyzing and generation of human-like text. These models rely on pre-trained data and lack real-time updates unless integrated with live data tools. RAG enhances LLMs by integrating online resources and databases to generate contextually appropriate responses. However, traditional RAG still encounters challenges like information dilution and hallucinations when handling vast amounts of data. Our approach addresses these challenges by converting corpora into a domain-specific dataset and RAG architecture is constructed to generate responses from the target document. We introduce QuIM-RAG (Question-to-question Inverted Index Matching), a novel approach for the retrieval mechanism in our system. This strategy generates potential questions from document chunks and matches these with user queries to identify the most relevant text chunks for generating accurate answers. We have implemented our RAG system on top of the open-source Meta-LLaMA3-8B-instruct model by Meta Inc. that is available on Hugging Face. We constructed a custom corpus of 500+ pages from a high-traffic website accessed thousands of times daily for answering complex questions, along with manually prepared ground truth QA for evaluation. We compared our approach with traditional RAG models using BERT-Score and RAGAS, state-of-the-art metrics for evaluating LLM applications. Our evaluation demonstrates that our approach outperforms traditional RAG architectures on both metrics.

QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance

TL;DR

The paper tackles domain-specific QA by introducing QuIM-RAG, a retrieval-augmented generation system that uses inverted question matching with quantized embedding prototypes to retrieve highly relevant document chunks. It formalizes QA on limited corpora with and employs to build an efficient inverted index, enabling precise, source-backed responses. Implemented on Meta-LLaMA3-8B-instruct and trained on a 500+ page domain corpus from NDSU sites, the approach yields substantial improvements over traditional RAG as evidenced by BERTScore and RAGAS metrics. The work demonstrates that domain-focused data preparation and a tailored retrieval mechanism can significantly mitigate information dilution and hallucination, enhancing both accuracy and trust with explicit source links. The findings have practical impact for deploying reliable QA systems in specialized domains that demand up-to-date, verifiable information.

Abstract

This work presents a novel architecture for building Retrieval-Augmented Generation (RAG) systems to improve Question Answering (QA) tasks from a target corpus. Large Language Models (LLMs) have revolutionized the analyzing and generation of human-like text. These models rely on pre-trained data and lack real-time updates unless integrated with live data tools. RAG enhances LLMs by integrating online resources and databases to generate contextually appropriate responses. However, traditional RAG still encounters challenges like information dilution and hallucinations when handling vast amounts of data. Our approach addresses these challenges by converting corpora into a domain-specific dataset and RAG architecture is constructed to generate responses from the target document. We introduce QuIM-RAG (Question-to-question Inverted Index Matching), a novel approach for the retrieval mechanism in our system. This strategy generates potential questions from document chunks and matches these with user queries to identify the most relevant text chunks for generating accurate answers. We have implemented our RAG system on top of the open-source Meta-LLaMA3-8B-instruct model by Meta Inc. that is available on Hugging Face. We constructed a custom corpus of 500+ pages from a high-traffic website accessed thousands of times daily for answering complex questions, along with manually prepared ground truth QA for evaluation. We compared our approach with traditional RAG models using BERT-Score and RAGAS, state-of-the-art metrics for evaluating LLM applications. Our evaluation demonstrates that our approach outperforms traditional RAG architectures on both metrics.
Paper Structure (20 sections, 10 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 10 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overall Architecture of Corpus Preparation for Modified RAG
  • Figure 2: Illustration of Inverted Index Construction for Question Matching
  • Figure 3: Overall Retrieval and Generation Architecture for RAG
  • Figure 4: The upper section details a prompt designed for creating a custom dataset, focusing on generating set of questions for each chunk. The lower section outlines a prompt for a RAG system, emphasizing accuracy and directive responses based on the dataset, with instructions on how to handle queries that extend beyond available data.
  • Figure 5: Workflow of QuIM-RAG system and Traditional RAG system for User Query Processing