Table of Contents
Fetching ...

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

Yagiz Can Akay, Muhammed Yusuf Kartal, Esra Alparslan, Faruk Ortakoyluoglu, Arda Akpinar

TL;DR

This work introduces SPD-RAG, a hierarchical multi-agent framework for exhaustive cross-document question answering that decomposes the problem along the document axis and improves scalability and answer quality in heterogeneous multidocument settings while yielding a modular, extensible retrieval pipeline.

Abstract

Answering complex, real-world queries often requires synthesizing facts scattered across vast document corpora. In these settings, standard retrieval-augmented generation (RAG) pipelines suffer from incomplete evidence coverage, while long-context large language models (LLMs) struggle to reason reliably over massive inputs. We introduce SPD-RAG, a hierarchical multi-agent framework for exhaustive cross-document question answering that decomposes the problem along the document axis. Each document is processed by a dedicated document-level agent operating only on its own content, enabling focused retrieval, while a coordinator dispatches tasks to relevant agents and aggregates their partial answers. Agent outputs are synthesized by merging partial answers through a token-bounded synthesis layer (which supports recursive map-reduce for massive corpora). This document-level specialization with centralized fusion improves scalability and answer quality in heterogeneous multidocument settings while yielding a modular, extensible retrieval pipeline. On the LOONG benchmark (EMNLP 2024) for long-context multi-document QA, SPD-RAG achieves an Avg Score of 58.1 (GPT-5 evaluation), outperforming Normal RAG (33.0) and Agentic RAG (32.8) while using only 38% of the API cost of a full-context baseline (68.0).

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

TL;DR

This work introduces SPD-RAG, a hierarchical multi-agent framework for exhaustive cross-document question answering that decomposes the problem along the document axis and improves scalability and answer quality in heterogeneous multidocument settings while yielding a modular, extensible retrieval pipeline.

Abstract

Answering complex, real-world queries often requires synthesizing facts scattered across vast document corpora. In these settings, standard retrieval-augmented generation (RAG) pipelines suffer from incomplete evidence coverage, while long-context large language models (LLMs) struggle to reason reliably over massive inputs. We introduce SPD-RAG, a hierarchical multi-agent framework for exhaustive cross-document question answering that decomposes the problem along the document axis. Each document is processed by a dedicated document-level agent operating only on its own content, enabling focused retrieval, while a coordinator dispatches tasks to relevant agents and aggregates their partial answers. Agent outputs are synthesized by merging partial answers through a token-bounded synthesis layer (which supports recursive map-reduce for massive corpora). This document-level specialization with centralized fusion improves scalability and answer quality in heterogeneous multidocument settings while yielding a modular, extensible retrieval pipeline. On the LOONG benchmark (EMNLP 2024) for long-context multi-document QA, SPD-RAG achieves an Avg Score of 58.1 (GPT-5 evaluation), outperforming Normal RAG (33.0) and Agentic RAG (32.8) while using only 38% of the API cost of a full-context baseline (68.0).
Paper Structure (47 sections, 1 equation, 5 figures, 4 tables, 1 algorithm)

This paper contains 47 sections, 1 equation, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of SPD-RAG Architecture.
  • Figure 2: Comparison of Average Score across the four systems, broken down by task type (Spotlight Locating, Comparison, Clustering, and Chain of Reasoning).
  • Figure 3: Average score by document domain comparing the Baseline (Full Context), Normal RAG, Agentic RAG, and SPD-RAG systems.
  • Figure 4: Cost--Quality tradeoff. The scatter plot illustrates the Pareto frontier for multi-document QA.
  • Figure 5: Average per-query latency across systems.