Table of Contents
Fetching ...

Creating a Taxonomy for Retrieval Augmented Generation Applications

Irina Nikishina, Özge Sevgili, Mahei Manhai Li, Chris Biemann, Martin Semmann

TL;DR

The paper addresses the lack of a holistic taxonomy for Retrieval-Augmented Generation (RAG) applications and proposes a structured taxonomy developed through four iterative phases. Grounded in the Nickerson2013method, it yields five meta-dimensions, sixteen dimensions, and sixty-one characteristics, validated via a literature review of 28 papers and ChatGPT-based domain clustering. The taxonomy covers general aspects, structure, data modalities, evaluation, and limitations, and outlines domain-specific applications, business value, ethical implications, and digital transformation considerations. It aims to support broader adoption, design knowledge, and future research by providing a clear, extensible framework for analyzing and developing RAG-based solutions across diverse domains.

Abstract

In this research, we develop a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define retrieval augmented generation (RAG) applications, facilitating the adoption of this technology for different application domains. To the best of our knowledge, no holistic RAG application taxonomies have been developed so far. We employ the method foreign to ACL and thus contribute to the set of methods in the taxonomy creation. It comprises four iterative phases designed to refine and enhance our understanding and presentation of RAG's core dimensions. We have developed a total of five meta-dimensions and sixteen dimensions to comprehensively capture the concept of RAG applications. Thus, the taxonomy can be used to better understand RAG applications and to derive design knowledge for future solutions in specific application domains.

Creating a Taxonomy for Retrieval Augmented Generation Applications

TL;DR

The paper addresses the lack of a holistic taxonomy for Retrieval-Augmented Generation (RAG) applications and proposes a structured taxonomy developed through four iterative phases. Grounded in the Nickerson2013method, it yields five meta-dimensions, sixteen dimensions, and sixty-one characteristics, validated via a literature review of 28 papers and ChatGPT-based domain clustering. The taxonomy covers general aspects, structure, data modalities, evaluation, and limitations, and outlines domain-specific applications, business value, ethical implications, and digital transformation considerations. It aims to support broader adoption, design knowledge, and future research by providing a clear, extensible framework for analyzing and developing RAG-based solutions across diverse domains.

Abstract

In this research, we develop a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define retrieval augmented generation (RAG) applications, facilitating the adoption of this technology for different application domains. To the best of our knowledge, no holistic RAG application taxonomies have been developed so far. We employ the method foreign to ACL and thus contribute to the set of methods in the taxonomy creation. It comprises four iterative phases designed to refine and enhance our understanding and presentation of RAG's core dimensions. We have developed a total of five meta-dimensions and sixteen dimensions to comprehensively capture the concept of RAG applications. Thus, the taxonomy can be used to better understand RAG applications and to derive design knowledge for future solutions in specific application domains.
Paper Structure (37 sections, 2 figures)

This paper contains 37 sections, 2 figures.

Figures (2)

  • Figure 1: RAG Taxonomy created from twenty-eight papers within four iterations.
  • Figure 2: Development of taxonomy dimensions and characteristics (adapted from Bräker2022ConceptualizingRemane2016Taxonomy)