Table of Contents
Fetching ...

AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline

Dongkyu Kim, Byoungwook Kim, Donggeon Han, Matouš Eibich

TL;DR

The AutoRAG framework is proposed, which automatically identifies suitable RAG modules for a given dataset and explores and approximates the optimal combination of RAG modules for the dataset.

Abstract

Using LLMs (Large Language Models) in conjunction with external documents has made RAG (Retrieval-Augmented Generation) an essential technology. Numerous techniques and modules for RAG are being researched, but their performance can vary across different datasets. Finding RAG modules that perform well on specific datasets is challenging. In this paper, we propose the AutoRAG framework, which automatically identifies suitable RAG modules for a given dataset. AutoRAG explores and approximates the optimal combination of RAG modules for the dataset. Additionally, we share the results of optimizing a dataset using AutoRAG. All experimental results and data are publicly available and can be accessed through our GitHub repository https://github.com/Marker-Inc-Korea/AutoRAG_ARAGOG_Paper .

AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline

TL;DR

The AutoRAG framework is proposed, which automatically identifies suitable RAG modules for a given dataset and explores and approximates the optimal combination of RAG modules for the dataset.

Abstract

Using LLMs (Large Language Models) in conjunction with external documents has made RAG (Retrieval-Augmented Generation) an essential technology. Numerous techniques and modules for RAG are being researched, but their performance can vary across different datasets. Finding RAG modules that perform well on specific datasets is challenging. In this paper, we propose the AutoRAG framework, which automatically identifies suitable RAG modules for a given dataset. AutoRAG explores and approximates the optimal combination of RAG modules for the dataset. Additionally, we share the results of optimizing a dataset using AutoRAG. All experimental results and data are publicly available and can be accessed through our GitHub repository https://github.com/Marker-Inc-Korea/AutoRAG_ARAGOG_Paper .

Paper Structure

This paper contains 50 sections, 6 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Structural diagram showing the overall structure of AutoRAG.
  • Figure 2: All RAG techniques used in this paper
  • Figure 3: An illustration of Query Decompose and HyDE query expansion modules.
  • Figure 4: An illustration of each passage reranker module.
  • Figure 5: Metrics used at each stages
  • ...and 6 more figures