Table of Contents
Fetching ...

Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis

Hazem Amamou, Stéphane Gagnon, Alan Davoust, Anderson R. Avila

TL;DR

A comparative analysis between the RAG baseline defined by the RGB and variations of GraphRAG, which is a RAG system based on a Knowledge Graph and developed to retrieve relevant information from large documents, demonstrates improvements compared to the RGB baseline.

Abstract

Retrieval-Augmented Generation (RAG) was introduced to enhance the capabilities of Large Language Models (LLMs) beyond their encoded prior knowledge. This is achieved by providing LLMs with an external source of knowledge, which helps reduce factual hallucinations and enables access to new information not available during pretraining. However, inconsistent retrieved information can negatively affect LLM responses. The Retrieval-Augmented Generation Benchmark (RGB) was introduced to evaluate the robustness of RAG systems under such conditions. In this work, we use the RGB corpus to evaluate LLMs in four scenarios: noise robustness, information integration, negative rejection, and counterfactual robustness. We perform a comparative analysis between the RGB RAG baseline and GraphRAG, a knowledge graph based retrieval system. We test three GraphRAG customizations to improve robustness. Results show improvements over the RGB baseline and provide insights for designing more reliable RAG systems for real world scenarios.

Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis

TL;DR

A comparative analysis between the RAG baseline defined by the RGB and variations of GraphRAG, which is a RAG system based on a Knowledge Graph and developed to retrieve relevant information from large documents, demonstrates improvements compared to the RGB baseline.

Abstract

Retrieval-Augmented Generation (RAG) was introduced to enhance the capabilities of Large Language Models (LLMs) beyond their encoded prior knowledge. This is achieved by providing LLMs with an external source of knowledge, which helps reduce factual hallucinations and enables access to new information not available during pretraining. However, inconsistent retrieved information can negatively affect LLM responses. The Retrieval-Augmented Generation Benchmark (RGB) was introduced to evaluate the robustness of RAG systems under such conditions. In this work, we use the RGB corpus to evaluate LLMs in four scenarios: noise robustness, information integration, negative rejection, and counterfactual robustness. We perform a comparative analysis between the RGB RAG baseline and GraphRAG, a knowledge graph based retrieval system. We test three GraphRAG customizations to improve robustness. Results show improvements over the RGB baseline and provide insights for designing more reliable RAG systems for real world scenarios.
Paper Structure (16 sections, 5 figures, 3 tables)

This paper contains 16 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparative Framework of RGB Benchmark, GraphRAG, and RobustGraphRAG for Evaluating Robustness in Retrieval-Augmented Generation
  • Figure 2: GR$_{RGB}$ prompt presented to the LLM, based on the RGB system combined with structured knowledge.
  • Figure 3: GR$_{ext}$ External-Only GraphRAG Prompt for Robust, Noise-Aware Answering
  • Figure 4: Comparative analysis for the noise robustness task measured by accuracy (%) for GPT-3.5 and GPT-4o-mini under varying noise ratios.
  • Figure 5: Comparative analysis for the negative rejection task measured by rejection rate (%) for GPT-4o-mini and GPT-3.5.