Table of Contents
Fetching ...

Mixture-of-RAG: Integrating Text and Tables with Large Language Models

Chi Zhang, Qiyang Chen, Mengqi Zhang

TL;DR

The paper tackles the challenge of grounding LLMs in heterogeneous documents that combine narrative text with hierarchical tables. It introduces MixRAG, a three-stage framework that preserves table hierarchy with H-RCL representations, performs cross-modal retrieval via ensemble ranking and LLM reranking, and applies RECAP-style multi-step reasoning with external calculators for arithmetic. To support this line of work, it provides DocRAGLib, a 2,178-document, 4,468-QA-pair benchmark for heterogeneous text-table documents. Empirically, MixRAG achieves state-of-the-art retrieval and QA performance on DocRAGLib, significantly outperforming text-only and table-only baselines and demonstrating favorable efficiency trade-offs for complex document grounding.

Abstract

Large language models (LLMs) achieve optimal utility when their responses are grounded in external knowledge sources. However, real-world documents, such as annual reports, scientific papers, and clinical guidelines, frequently combine extensive narrative content with complex, hierarchically structured tables. While existing retrieval-augmented generation (RAG) systems effectively integrate LLMs' generative capabilities with external retrieval-based information, their performance significantly deteriorates especially processing such heterogeneous text-table hierarchies. To address this limitation, we formalize the task of Heterogeneous Document RAG, which requires joint retrieval and reasoning across textual and hierarchical tabular data. We propose MixRAG, a novel three-stage framework: (i) hierarchy row-and-column-level (H-RCL) representation that preserves hierarchical structure and heterogeneous relationship, (ii) an ensemble retriever with LLM-based reranking for evidence alignment, and (iii) multi-step reasoning decomposition via a RECAP prompt strategy. To bridge the gap in available data for this domain, we release the dataset DocRAGLib, a 2k-document corpus paired with automatically aligned text-table summaries and gold document annotations. The comprehensive experiment results demonstrate that MixRAG boosts top-1 retrieval by 46% over strong text-only, table-only, and naive-mixture baselines, establishing new state-of-the-art performance for mixed-modality document grounding.

Mixture-of-RAG: Integrating Text and Tables with Large Language Models

TL;DR

The paper tackles the challenge of grounding LLMs in heterogeneous documents that combine narrative text with hierarchical tables. It introduces MixRAG, a three-stage framework that preserves table hierarchy with H-RCL representations, performs cross-modal retrieval via ensemble ranking and LLM reranking, and applies RECAP-style multi-step reasoning with external calculators for arithmetic. To support this line of work, it provides DocRAGLib, a 2,178-document, 4,468-QA-pair benchmark for heterogeneous text-table documents. Empirically, MixRAG achieves state-of-the-art retrieval and QA performance on DocRAGLib, significantly outperforming text-only and table-only baselines and demonstrating favorable efficiency trade-offs for complex document grounding.

Abstract

Large language models (LLMs) achieve optimal utility when their responses are grounded in external knowledge sources. However, real-world documents, such as annual reports, scientific papers, and clinical guidelines, frequently combine extensive narrative content with complex, hierarchically structured tables. While existing retrieval-augmented generation (RAG) systems effectively integrate LLMs' generative capabilities with external retrieval-based information, their performance significantly deteriorates especially processing such heterogeneous text-table hierarchies. To address this limitation, we formalize the task of Heterogeneous Document RAG, which requires joint retrieval and reasoning across textual and hierarchical tabular data. We propose MixRAG, a novel three-stage framework: (i) hierarchy row-and-column-level (H-RCL) representation that preserves hierarchical structure and heterogeneous relationship, (ii) an ensemble retriever with LLM-based reranking for evidence alignment, and (iii) multi-step reasoning decomposition via a RECAP prompt strategy. To bridge the gap in available data for this domain, we release the dataset DocRAGLib, a 2k-document corpus paired with automatically aligned text-table summaries and gold document annotations. The comprehensive experiment results demonstrate that MixRAG boosts top-1 retrieval by 46% over strong text-only, table-only, and naive-mixture baselines, establishing new state-of-the-art performance for mixed-modality document grounding.

Paper Structure

This paper contains 29 sections, 10 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: An Example of Heterogeneous Document RAG Task
  • Figure 2: Distribution of domains in DocRAGLib
  • Figure 3: The Overview of our MixRAG Framework
  • Figure 4: The Path and Hierarchical Levels in the Table
  • Figure 5: Scalability Analysis of Retrieval Performance across Different Corpus Sizes.
  • ...and 1 more figures