AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications
Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Van-Cuong Pham, Hoang Ngo, Dat Quoc Nguyen
TL;DR
AccurateRAG targets end-to-end, high-accuracy retrieval-augmented QA by modularizing the pipeline into data preprocessing, synthetic data generation, retrieval, and answer generation with tunable components. It combines semantic and conventional retrieval, a validation-driven strategy selection, and LoRA-based fine-tuning to produce expanded-context training data and robust LLM outputs. Empirical results on FinanceBench and multiple benchmarks demonstrate state-of-the-art QA performance and clear ablations showing the value of the Preprocessor, Fine-tuning Data Generator, and model fine-tuning. The framework enables local deployment and rapid iteration for domain-specific, up-to-date QA tasks, offering a practical blueprint for building high-accuracy RAG systems.
Abstract
We introduce AccurateRAG -- a novel framework for constructing high-performance question-answering applications based on retrieval-augmented generation (RAG). Our framework offers a pipeline for development efficiency with tools for raw dataset processing, fine-tuning data generation, text embedding & LLM fine-tuning, output evaluation, and building RAG systems locally. Experimental results show that our framework outperforms previous strong baselines and obtains new state-of-the-art question-answering performance on benchmark datasets.
