Table of Contents
Fetching ...

Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM

Yuxin Yang, Haoyang Wu, Tao Wang, Jia Yang, Hao Ma, Guojie Luo

TL;DR

This work introduces the Pseudo-Knowledge Graph (PKG), a retrieval framework that augments LLMs with a hybrid storage-and-retrieval system combining knowledge graphs, in-graph text, and multi-method retrieval (regular expressions, vector space search, and meta-path traversal). By preserving natural language chunks within a graph-structured PKG and leveraging meta-paths for multi-hop reasoning, PKG addresses limitations of traditional RAG and KG approaches in large, complex knowledge bases. Across Open Compass and MultiHop-RAG benchmarks and multiple model sizes, PKG demonstrates superior accuracy and robustness, driven by a combination of rich textual context, diverse retrieval modalities, and adaptive post-processing. The approach offers practical benefits for domains requiring precise, multi-hop factual grounding and scalable knowledge access, with future work aimed at multi-turn conversations, scalability improvements, and interactive knowledge exploration.

Abstract

The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.

Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM

TL;DR

This work introduces the Pseudo-Knowledge Graph (PKG), a retrieval framework that augments LLMs with a hybrid storage-and-retrieval system combining knowledge graphs, in-graph text, and multi-method retrieval (regular expressions, vector space search, and meta-path traversal). By preserving natural language chunks within a graph-structured PKG and leveraging meta-paths for multi-hop reasoning, PKG addresses limitations of traditional RAG and KG approaches in large, complex knowledge bases. Across Open Compass and MultiHop-RAG benchmarks and multiple model sizes, PKG demonstrates superior accuracy and robustness, driven by a combination of rich textual context, diverse retrieval modalities, and adaptive post-processing. The approach offers practical benefits for domains requiring precise, multi-hop factual grounding and scalable knowledge access, with future work aimed at multi-turn conversations, scalability improvements, and interactive knowledge exploration.

Abstract

The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.

Paper Structure

This paper contains 33 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The overall framework of our PKG approach. We enhance LLMs by integrating diverse methods for building and retrieving PKG.
  • Figure 2: The extraction of entities and relations in PKG Builder. After transformation raw data into source text, We use two distinct approaches: traditional methods utilizing NLP approaches and modern techniques employing LLMs. Also, we employ LLMs to review and verify the information extracted using traditional NLP methods.
  • Figure 3: Nodes and Their Properties. (a) illustrates the components of a basic node; (b) presents an example of two entity nodes extracted from a single text chunk node.
  • Figure 4: The organization of text data within a PKG Storage System. Each entity node must be connected to at least one source text chunk node.
  • Figure 5: PKG Retriever. The retrieval process begins with a user query. Then, we get the query itself, entities inside the query, and hypothetical answers for retrieval. The retrieval methods are categorized into three types: Regular Expression Retrieval, which utilizes regular expressions to identify nodes and their relations; Vector Retrieval, which employs vector-based methods to find relevant nodes and their associated relations; and Meta-path Retrieval, which explores start nodes and their connections through specified meta-paths. The content in the light yellow boxes is what we can obtain from the PKG Retriever.
  • ...and 2 more figures