Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases
Jiarui Li, Ye Yuan, Zehua Zhang
TL;DR
The paper tackles LLM hallucinations in domain-specific Q&A by integrating a Retrieval Augmented Generation (RAG) pipeline with a CMU/LTI-focused external dataset. It introduces an end-to-end system, including web data collection, automated QA annotation, embedding and reranking, and core generation with LLaMA-2, coupled with thorough ablations and case studies. Key contributions are the CMU/LTI dataset construction, a state-of-the-art RAG pipeline tailored to knowledge-intensive tasks, and a rigorous evaluation showing improved factual accuracy while highlighting limitations of small, biased datasets. The work demonstrates the practical potential of external data augmentation for domain-specific QA and provides a reproducible framework for future knowledge-intensive NLP systems.
Abstract
We proposed an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of Large Language Models (LLMs) for domain-specific and time-sensitive queries related to private knowledge-bases. Our system integrates RAG pipeline with upstream datasets processing and downstream performance evaluation. Addressing the challenge of LLM hallucinations, we finetune models with a curated dataset which originates from CMU's extensive resources and annotated with the teacher model. Our experiments demonstrate the system's effectiveness in generating more accurate answers to domain-specific and time-sensitive inquiries. The results also revealed the limitations of fine-tuning LLMs with small-scale and skewed datasets. This research highlights the potential of RAG systems in augmenting LLMs with external datasets for improved performance in knowledge-intensive tasks. Our code and models are available on Github.
