Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM
Yuxin Yang, Haoyang Wu, Tao Wang, Jia Yang, Hao Ma, Guojie Luo
TL;DR
This work introduces the Pseudo-Knowledge Graph (PKG), a retrieval framework that augments LLMs with a hybrid storage-and-retrieval system combining knowledge graphs, in-graph text, and multi-method retrieval (regular expressions, vector space search, and meta-path traversal). By preserving natural language chunks within a graph-structured PKG and leveraging meta-paths for multi-hop reasoning, PKG addresses limitations of traditional RAG and KG approaches in large, complex knowledge bases. Across Open Compass and MultiHop-RAG benchmarks and multiple model sizes, PKG demonstrates superior accuracy and robustness, driven by a combination of rich textual context, diverse retrieval modalities, and adaptive post-processing. The approach offers practical benefits for domains requiring precise, multi-hop factual grounding and scalable knowledge access, with future work aimed at multi-turn conversations, scalability improvements, and interactive knowledge exploration.
Abstract
The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.
