AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

Linh The Nguyen; Chi Tran; Dung Ngoc Nguyen; Van-Cuong Pham; Hoang Ngo; Dat Quoc Nguyen

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Van-Cuong Pham, Hoang Ngo, Dat Quoc Nguyen

TL;DR

AccurateRAG targets end-to-end, high-accuracy retrieval-augmented QA by modularizing the pipeline into data preprocessing, synthetic data generation, retrieval, and answer generation with tunable components. It combines semantic and conventional retrieval, a validation-driven strategy selection, and LoRA-based fine-tuning to produce expanded-context training data and robust LLM outputs. Empirical results on FinanceBench and multiple benchmarks demonstrate state-of-the-art QA performance and clear ablations showing the value of the Preprocessor, Fine-tuning Data Generator, and model fine-tuning. The framework enables local deployment and rapid iteration for domain-specific, up-to-date QA tasks, offering a practical blueprint for building high-accuracy RAG systems.

Abstract

We introduce AccurateRAG -- a novel framework for constructing high-performance question-answering applications based on retrieval-augmented generation (RAG). Our framework offers a pipeline for development efficiency with tools for raw dataset processing, fine-tuning data generation, text embedding & LLM fine-tuning, output evaluation, and building RAG systems locally. Experimental results show that our framework outperforms previous strong baselines and obtains new state-of-the-art question-answering performance on benchmark datasets.

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

TL;DR

Abstract

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)