Table of Contents
Fetching ...

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Van-Cuong Pham, Hoang Ngo, Dat Quoc Nguyen

TL;DR

AccurateRAG targets end-to-end, high-accuracy retrieval-augmented QA by modularizing the pipeline into data preprocessing, synthetic data generation, retrieval, and answer generation with tunable components. It combines semantic and conventional retrieval, a validation-driven strategy selection, and LoRA-based fine-tuning to produce expanded-context training data and robust LLM outputs. Empirical results on FinanceBench and multiple benchmarks demonstrate state-of-the-art QA performance and clear ablations showing the value of the Preprocessor, Fine-tuning Data Generator, and model fine-tuning. The framework enables local deployment and rapid iteration for domain-specific, up-to-date QA tasks, offering a practical blueprint for building high-accuracy RAG systems.

Abstract

We introduce AccurateRAG -- a novel framework for constructing high-performance question-answering applications based on retrieval-augmented generation (RAG). Our framework offers a pipeline for development efficiency with tools for raw dataset processing, fine-tuning data generation, text embedding & LLM fine-tuning, output evaluation, and building RAG systems locally. Experimental results show that our framework outperforms previous strong baselines and obtains new state-of-the-art question-answering performance on benchmark datasets.

AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering Applications

TL;DR

AccurateRAG targets end-to-end, high-accuracy retrieval-augmented QA by modularizing the pipeline into data preprocessing, synthetic data generation, retrieval, and answer generation with tunable components. It combines semantic and conventional retrieval, a validation-driven strategy selection, and LoRA-based fine-tuning to produce expanded-context training data and robust LLM outputs. Empirical results on FinanceBench and multiple benchmarks demonstrate state-of-the-art QA performance and clear ablations showing the value of the Preprocessor, Fine-tuning Data Generator, and model fine-tuning. The framework enables local deployment and rapid iteration for domain-specific, up-to-date QA tasks, offering a practical blueprint for building high-accuracy RAG systems.

Abstract

We introduce AccurateRAG -- a novel framework for constructing high-performance question-answering applications based on retrieval-augmented generation (RAG). Our framework offers a pipeline for development efficiency with tools for raw dataset processing, fine-tuning data generation, text embedding & LLM fine-tuning, output evaluation, and building RAG systems locally. Experimental results show that our framework outperforms previous strong baselines and obtains new state-of-the-art question-answering performance on benchmark datasets.

Paper Structure

This paper contains 17 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Architecture illustration of our AccurateRAG.
  • Figure 2: PDF content input.
  • Figure 3: Markdown-formatted text output.
  • Figure 4: Answer judgment prompt.
  • Figure 5: UI for the Preprocessor component and text embedding model fine-tuning in the semantic search module.
  • ...and 1 more figures