Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

Tristan Kenneweg; Philip Kenneweg; Barbara Hammer

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

Tristan Kenneweg, Philip Kenneweg, Barbara Hammer

TL;DR

A rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies is presented and a system in which a LLM can decide whether to query a vector database or not is proposed, thus saving tokens on questions that can be answered with internal knowledge.

Abstract

Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-Language Model (LLM) outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecdotal evidence at the moment. In this paper we present a rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies. We use a dataset created this way for the development and evaluation of a boolean agent RAG setup: A system in which a LLM can decide whether to query a vector database or not, thus saving tokens on questions that can be answered with internal knowledge. We publish our code and generated dataset online.

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

TL;DR

Abstract

Paper Structure (7 sections, 4 figures, 3 tables)

This paper contains 7 sections, 4 figures, 3 tables.

Introduction
Related Work
Dataset & Evaluation Workflow
Dataset
Automatic Evaluation
Boolean Agent RAG Evaluation
Conclusion

Figures (4)

Figure 1: Sum of accuracy and relevance for different baseline test setups. a) no RAG 300 random articles from $A_r$, b) no RAG 300 articles from $A_d$, c) no RAG all 256 articles from $A_f$, d) $A_f$ with the correct article supplied to the answerer.
Figure 2: Results of using naive RAG on $A_f$. The average truthfulness is 4.71 and average relevance is 4.66.
Figure 3: Schematic overview of the proposed boolean agent RAG system.
Figure 4: Results of advanced boolean agent RAG on a) $A_r$, b) $A_f$.

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

TL;DR

Abstract

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

Authors

TL;DR

Abstract

Table of Contents

Figures (4)