Table of Contents
Fetching ...

G-RAG: Knowledge Expansion in Material Science

Radeen Mostafa, Mirza Nihal Baig, Mashaekh Tausif Ehsan, Jakir Hasan

TL;DR

This work addresses the challenge of unreliable information retrieval and hallucinations in LLM-driven material science queries by introducing Graph RAG and its material-science specialization, G-RAG. The proposed pipeline leverages a knowledge graph and external KBs (e.g., Wikipedia) with entity linking, Span Parser-based retrieval, and a Passage Processor to produce precise, context-rich outputs. Empirical results across ten handwritten queries show that G-RAG improves correctness and faithfulness by harnessing domain-specific knowledge, though context relevancy varies and remains sensitive to knowledge integration quality. The approach demonstrates the potential of domain-aware graph-based retrieval to enhance factual precision and contextual grounding in specialized scientific domains, with plans to scale knowledge bases and entity linking for broader applicability.

Abstract

In the field of Material Science, effective information retrieval systems are essential for facilitating research. Traditional Retrieval-Augmented Generation (RAG) approaches in Large Language Models (LLMs) often encounter challenges such as outdated information, hallucinations, limited interpretability due to context constraints, and inaccurate retrieval. To address these issues, Graph RAG integrates graph databases to enhance the retrieval process. Our proposed method processes Material Science documents by extracting key entities (referred to as MatIDs) from sentences, which are then utilized to query external Wikipedia knowledge bases (KBs) for additional relevant information. We implement an agent-based parsing technique to achieve a more detailed representation of the documents. Our improved version of Graph RAG called G-RAG further leverages a graph database to capture relationships between these entities, improving both retrieval accuracy and contextual understanding. This enhanced approach demonstrates significant improvements in performance for domains that require precise information retrieval, such as Material Science.

G-RAG: Knowledge Expansion in Material Science

TL;DR

This work addresses the challenge of unreliable information retrieval and hallucinations in LLM-driven material science queries by introducing Graph RAG and its material-science specialization, G-RAG. The proposed pipeline leverages a knowledge graph and external KBs (e.g., Wikipedia) with entity linking, Span Parser-based retrieval, and a Passage Processor to produce precise, context-rich outputs. Empirical results across ten handwritten queries show that G-RAG improves correctness and faithfulness by harnessing domain-specific knowledge, though context relevancy varies and remains sensitive to knowledge integration quality. The approach demonstrates the potential of domain-aware graph-based retrieval to enhance factual precision and contextual grounding in specialized scientific domains, with plans to scale knowledge bases and entity linking for broader applicability.

Abstract

In the field of Material Science, effective information retrieval systems are essential for facilitating research. Traditional Retrieval-Augmented Generation (RAG) approaches in Large Language Models (LLMs) often encounter challenges such as outdated information, hallucinations, limited interpretability due to context constraints, and inaccurate retrieval. To address these issues, Graph RAG integrates graph databases to enhance the retrieval process. Our proposed method processes Material Science documents by extracting key entities (referred to as MatIDs) from sentences, which are then utilized to query external Wikipedia knowledge bases (KBs) for additional relevant information. We implement an agent-based parsing technique to achieve a more detailed representation of the documents. Our improved version of Graph RAG called G-RAG further leverages a graph database to capture relationships between these entities, improving both retrieval accuracy and contextual understanding. This enhanced approach demonstrates significant improvements in performance for domains that require precise information retrieval, such as Material Science.

Paper Structure

This paper contains 22 sections, 29 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Architecture of G-RAG System
  • Figure 2: Document Parsing
  • Figure 3: Validity Check by Agent System
  • Figure 4: Entity Linking and Relation Extraction
  • Figure 5: Entity Linking
  • ...and 4 more figures