Table of Contents
Fetching ...

Optimizing open-domain question answering with graph-based retrieval augmented generation

Joyce Cahoon, Prerna Singh, Nick Litombe, Jonathan Larson, Ha Trinh, Yiwen Zhu, Andreas Mueller, Fotis Psallidas, Carlo Curino

TL;DR

The paper tackles open-domain QA by distinguishing OLTP and OLAP query types and argues that graph-based RAG can better synthesize information across multiple documents. It introduces TREX, a cost-efficient hybrid that extends RAPTOR’s hierarchical clustering with vector-based retrieval, enabling scalable, multi-document reasoning. Through benchmarking on four diverse datasets and a real-world case study, TREX demonstrates competitive accuracy and substantial cost savings relative to GraphRAG, RAPTOR, and Azure Hybrid Search, while also revealing evaluation and synthesis challenges in current open-domain QA. The work provides practical guidance on when to deploy graph-based vs vector-based retrieval and emphasizes the need for standardized evaluation metrics to reliably assess faithfulness and usefulness in generated answers.

Abstract

In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs, we can facilitate the retrieval of context that captures greater semantic depth and enhances language model operations. We explore graph-based RAG methodologies and introduce TREX, a novel, cost-effective alternative that combines graph-based and vector-based retrieval techniques. Our benchmarking across four diverse datasets highlights the strengths of different RAG methodologies, demonstrates TREX's ability to handle multiple open-domain QA types, and reveals the limitations of current evaluation methods. In a real-world technical support case study, we demonstrate how TREX solutions can surpass conventional vector-based RAG in efficiently synthesizing data from heterogeneous sources. Our findings underscore the potential of augmenting large language models with advanced retrieval and orchestration capabilities, advancing scalable, graph-based AI solutions.

Optimizing open-domain question answering with graph-based retrieval augmented generation

TL;DR

The paper tackles open-domain QA by distinguishing OLTP and OLAP query types and argues that graph-based RAG can better synthesize information across multiple documents. It introduces TREX, a cost-efficient hybrid that extends RAPTOR’s hierarchical clustering with vector-based retrieval, enabling scalable, multi-document reasoning. Through benchmarking on four diverse datasets and a real-world case study, TREX demonstrates competitive accuracy and substantial cost savings relative to GraphRAG, RAPTOR, and Azure Hybrid Search, while also revealing evaluation and synthesis challenges in current open-domain QA. The work provides practical guidance on when to deploy graph-based vs vector-based retrieval and emphasizes the need for standardized evaluation metrics to reliably assess faithfulness and usefulness in generated answers.

Abstract

In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs, we can facilitate the retrieval of context that captures greater semantic depth and enhances language model operations. We explore graph-based RAG methodologies and introduce TREX, a novel, cost-effective alternative that combines graph-based and vector-based retrieval techniques. Our benchmarking across four diverse datasets highlights the strengths of different RAG methodologies, demonstrates TREX's ability to handle multiple open-domain QA types, and reveals the limitations of current evaluation methods. In a real-world technical support case study, we demonstrate how TREX solutions can surpass conventional vector-based RAG in efficiently synthesizing data from heterogeneous sources. Our findings underscore the potential of augmenting large language models with advanced retrieval and orchestration capabilities, advancing scalable, graph-based AI solutions.

Paper Structure

This paper contains 38 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Breakdown of the percentage of OLTP vs OLAP-style queries within each benchmark.
  • Figure 2: An example of a hierarchical tree structure built from a set of text chunks ending with the root node. The summary nodes generated by a LLM are then inserted into a vector database youtube_raptor_long_context.
  • Figure 3: Comparison of accuracy of answers generated by TREX, GraphRAG, RAPTOR, Azure AI Hybrid Search as well as Oracle on OLTP benchmarks.
  • Figure 4: Mean Win Rates of TREX versus RAPTOR, GraphRAG Global Search, and Hybrid Search across the metric of comprehensiveness, diversity and empowerment on the two OLAP-style benchmarks. Results that are significant are marked with an asterisk. Global Search results shown are obtained from setting community level at 4 as that hierarchy resulted in the optimal response.
  • Figure 5: Architecture of data preparation system in Azure Data Copilot and how TREX can be applied to improve contextual retrieval.