Knowledge Graph-extended Retrieval Augmented Generation for Question Answering
Jasper Linders, Jakub M. Tomczak
TL;DR
This work addresses the limitations of LLMs and KGs in isolation by proposing KG-RAG, a training-free system that combines KG-based retrieval with LLM reasoning. A new question decomposition module generates a chain-of-thought followed by sub-questions to improve multi-hop retrieval and explainability, while keeping generalizability across KGs. Evaluations on the MetaQA benchmark show improved accuracy for multi-hop questions, with a small trade-off on single-hop cases, and qualitative analysis highlights enhanced transparency through explicit reasoning traces. The approach demonstrates that bridging unstructured NL understanding with structured KG retrieval can yield more interpretable QA systems without domain-specific training, with potential for broader KG-enabled AI applications.
Abstract
Large Language Models (LLMs) and Knowledge Graphs (KGs) offer a promising approach to robust and explainable Question Answering (QA). While LLMs excel at natural language understanding, they suffer from knowledge gaps and hallucinations. KGs provide structured knowledge but lack natural language interaction. Ideally, an AI system should be both robust to missing facts as well as easy to communicate with. This paper proposes such a system that integrates LLMs and KGs without requiring training, ensuring adaptability across different KGs with minimal human effort. The resulting approach can be classified as a specific form of a Retrieval Augmented Generation (RAG) with a KG, thus, it is dubbed Knowledge Graph-extended Retrieval Augmented Generation (KG-RAG). It includes a question decomposition module to enhance multi-hop information retrieval and answer explainability. Using In-Context Learning (ICL) and Chain-of-Thought (CoT) prompting, it generates explicit reasoning chains processed separately to improve truthfulness. Experiments on the MetaQA benchmark show increased accuracy for multi-hop questions, though with a slight trade-off in single-hop performance compared to LLM with KG baselines. These findings demonstrate KG-RAG's potential to improve transparency in QA by bridging unstructured language understanding with structured knowledge retrieval.
