Table of Contents
Fetching ...

DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains

Yongkang Xiao, Sinian Zhang, Yi Dai, Huixue Zhou, Jue Hou, Jie Ding, Rui Zhang

TL;DR

DrKGC introduces a dynamic, retrieval-augmented LLM framework for knowledge graph completion that preserves graph structure through a lightweight pretraining stage, rules-based subgraph retrieval, and a GCN adapter to produce local embeddings that enrich LLM prompts. By converting KG queries into question templates, ranking candidate entities, and constructing a bottom-up subgraph guided by learned rules, DrKGC achieves state-of-the-art results on four datasets, including two biomedical KGs, while offering improved interpretability via explicit subgraph reasoning. The approach demonstrates robustness under inductive and noisy conditions and highlights the importance of combining structural signals with prompt design and LLM reasoning. Overall, DrKGC advances KGC by tightly integrating structural graph cues with generation-based models, enabling accurate, context-aware predictions in both general and biomedical domains.

Abstract

Knowledge graph completion (KGC) aims to predict missing triples in knowledge graphs (KGs) by leveraging existing triples and textual information. Recently, generative large language models (LLMs) have been increasingly employed for graph tasks. However, current approaches typically encode graph context in textual form, which fails to fully exploit the potential of LLMs for perceiving and reasoning about graph structures. To address this limitation, we propose DrKGC (Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion). DrKGC employs a flexible lightweight model training strategy to learn structural embeddings and logical rules within the KG. It then leverages a novel bottom-up graph retrieval method to extract a subgraph for each query guided by the learned rules. Finally, a graph convolutional network (GCN) adapter uses the retrieved subgraph to enhance the structural embeddings, which are then integrated into the prompt for effective LLM fine-tuning. Experimental results on two general domain benchmark datasets and two biomedical datasets demonstrate the superior performance of DrKGC. Furthermore, a realistic case study in the biomedical domain highlights its interpretability and practical utility.

DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains

TL;DR

DrKGC introduces a dynamic, retrieval-augmented LLM framework for knowledge graph completion that preserves graph structure through a lightweight pretraining stage, rules-based subgraph retrieval, and a GCN adapter to produce local embeddings that enrich LLM prompts. By converting KG queries into question templates, ranking candidate entities, and constructing a bottom-up subgraph guided by learned rules, DrKGC achieves state-of-the-art results on four datasets, including two biomedical KGs, while offering improved interpretability via explicit subgraph reasoning. The approach demonstrates robustness under inductive and noisy conditions and highlights the importance of combining structural signals with prompt design and LLM reasoning. Overall, DrKGC advances KGC by tightly integrating structural graph cues with generation-based models, enabling accurate, context-aware predictions in both general and biomedical domains.

Abstract

Knowledge graph completion (KGC) aims to predict missing triples in knowledge graphs (KGs) by leveraging existing triples and textual information. Recently, generative large language models (LLMs) have been increasingly employed for graph tasks. However, current approaches typically encode graph context in textual form, which fails to fully exploit the potential of LLMs for perceiving and reasoning about graph structures. To address this limitation, we propose DrKGC (Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion). DrKGC employs a flexible lightweight model training strategy to learn structural embeddings and logical rules within the KG. It then leverages a novel bottom-up graph retrieval method to extract a subgraph for each query guided by the learned rules. Finally, a graph convolutional network (GCN) adapter uses the retrieved subgraph to enhance the structural embeddings, which are then integrated into the prompt for effective LLM fine-tuning. Experimental results on two general domain benchmark datasets and two biomedical datasets demonstrate the superior performance of DrKGC. Furthermore, a realistic case study in the biomedical domain highlights its interpretability and practical utility.

Paper Structure

This paper contains 31 sections, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overview of the DrKGC framework. Light-blue arrows denote the dataset-level workflow (run once per KG); black arrows denote the per-triple workflow (run for each incomplete triple).
  • Figure 2: Robustness evaluation on WN18RR. (a) Comparison of evaluation metrics under the inductive setting versus the overall test set. (b) Impact of proportional noise addition on model performance.
  • Figure 3: Impact of $\tau$ on DrKGC Performance and time consumption on WN18RR.
  • Figure 4: Comparison of DrKGC performance using different LLMs across four datasets.
  • Figure 5: Example of multi-hop mechanism paths from drugs to Breast Cancer: purple, blue, and orange nodes represent drugs, diseases, and genes/proteins.