Table of Contents
Fetching ...

Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Yunfan Gao, Yun Xiong, Meng Wang, Haofen Wang

TL;DR

Modular RAG reframes retrieval-augmented generation as a three-tier architecture of modules, sub-modules, and operators to overcome the rigidity and fragmentation of traditional RAG pipelines. By defining a RAG Flow with dedicated orchestration (routing, scheduling, fusion) and enumerating six core modules, the paper offers a unified, graph-based approach to building flexible, scalable, and maintainable RAG systems. It analyzes six flow patterns—linear, conditional, branching, loop, and tuning—plus practical techniques for indexing, pre-retrieval, retrieval, post-retrieval, generation, and verification, and discusses compatibility with emerging methods and new operators. The work provides a theoretical foundation and practical roadmap for evolving RAG technology, enabling diversified workflows, easier debugging, and broader deployment across heterogeneous data sources and applications.

Abstract

Retrieval-augmented Generation (RAG) has markedly enhanced the capabilities of Large Language Models (LLMs) in tackling knowledge-intensive tasks. The increasing demands of application scenarios have driven the evolution of RAG, leading to the integration of advanced retrievers, LLMs and other complementary technologies, which in turn has amplified the intricacy of RAG systems. However, the rapid advancements are outpacing the foundational RAG paradigm, with many methods struggling to be unified under the process of "retrieve-then-generate". In this context, this paper examines the limitations of the existing RAG paradigm and introduces the modular RAG framework. By decomposing complex RAG systems into independent modules and specialized operators, it facilitates a highly reconfigurable framework. Modular RAG transcends the traditional linear architecture, embracing a more advanced design that integrates routing, scheduling, and fusion mechanisms. Drawing on extensive research, this paper further identifies prevalent RAG patterns-linear, conditional, branching, and looping-and offers a comprehensive analysis of their respective implementation nuances. Modular RAG presents innovative opportunities for the conceptualization and deployment of RAG systems. Finally, the paper explores the potential emergence of new operators and paradigms, establishing a solid theoretical foundation and a practical roadmap for the continued evolution and practical deployment of RAG technologies.

Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

TL;DR

Modular RAG reframes retrieval-augmented generation as a three-tier architecture of modules, sub-modules, and operators to overcome the rigidity and fragmentation of traditional RAG pipelines. By defining a RAG Flow with dedicated orchestration (routing, scheduling, fusion) and enumerating six core modules, the paper offers a unified, graph-based approach to building flexible, scalable, and maintainable RAG systems. It analyzes six flow patterns—linear, conditional, branching, loop, and tuning—plus practical techniques for indexing, pre-retrieval, retrieval, post-retrieval, generation, and verification, and discusses compatibility with emerging methods and new operators. The work provides a theoretical foundation and practical roadmap for evolving RAG technology, enabling diversified workflows, easier debugging, and broader deployment across heterogeneous data sources and applications.

Abstract

Retrieval-augmented Generation (RAG) has markedly enhanced the capabilities of Large Language Models (LLMs) in tackling knowledge-intensive tasks. The increasing demands of application scenarios have driven the evolution of RAG, leading to the integration of advanced retrievers, LLMs and other complementary technologies, which in turn has amplified the intricacy of RAG systems. However, the rapid advancements are outpacing the foundational RAG paradigm, with many methods struggling to be unified under the process of "retrieve-then-generate". In this context, this paper examines the limitations of the existing RAG paradigm and introduces the modular RAG framework. By decomposing complex RAG systems into independent modules and specialized operators, it facilitates a highly reconfigurable framework. Modular RAG transcends the traditional linear architecture, embracing a more advanced design that integrates routing, scheduling, and fusion mechanisms. Drawing on extensive research, this paper further identifies prevalent RAG patterns-linear, conditional, branching, and looping-and offers a comprehensive analysis of their respective implementation nuances. Modular RAG presents innovative opportunities for the conceptualization and deployment of RAG systems. Finally, the paper explores the potential emergence of new operators and paradigms, establishing a solid theoretical foundation and a practical roadmap for the continued evolution and practical deployment of RAG technologies.
Paper Structure (41 sections, 30 equations, 17 figures, 1 table, 7 algorithms)

This paper contains 41 sections, 30 equations, 17 figures, 1 table, 7 algorithms.

Figures (17)

  • Figure 1: Cases of Naive RAG and Advanced RAG.When faced with complex questions, both encounter limitations and struggle to provide satisfactory answers. Despite the fact that Advanced RAG improves retrieval accuracy through hierarchical indexing, pre-retrieval, and post-retrieval processes, these relevant documents have not been used correctly.
  • Figure 2: Case of current Modular RAG.The system integrates diverse data and more functional components. The process is no longer confined to linear but is controlled by multiple control components for retrieval and generation, making the entire system more flexible and complex.
  • Figure 3: Comparison between three RAG paradigms. Modular RAG has evolved from previous paradigms and aligns with the current practical needs of RAG systems.
  • Figure 4: Linear RAG flow pattern. Each module is processed in a fixed sequential order.
  • Figure 5: RRR RRR is a typical linear flow that introduces a learnable query rewrite module before retrieval. This module employs reinforcement based on the output results of the LLM.
  • ...and 12 more figures