Table of Contents
Fetching ...

GS-KGC: A Generative Subgraph-based Framework for Knowledge Graph Completion with Large Language Models

Rui Yang, Jiahao Zhu, Jianping Man, Hongze Liu, Li Fang, Yi Zhou

TL;DR

This work introduces GS-KGC, a subgraph-aware generative framework for knowledge graph completion that treats the task as QA and uses subgraph-derived negatives and neighbors to guide LLM reasoning. By partitioning subgraphs and applying information-merge strategies, GS-KGC enables direct generation of plausible missing entities, including facts beyond the existing KG. The paper demonstrates strong empirical gains over text-based and LLM-enhanced baselines on SKG and TKGC datasets, with ablations showing negatives as the primary driver of improvement and neighbors providing complementary context. The approach offers scalable, open-world capable KGC with potential for domain-specific pre-training and future extensions.

Abstract

Knowledge graph completion (KGC) focuses on identifying missing triples in a knowledge graph (KG) , which is crucial for many downstream applications. Given the rapid development of large language models (LLMs), some LLM-based methods are proposed for KGC task. However, most of them focus on prompt engineering while overlooking the fact that finer-grained subgraph information can aid LLMs in generating more accurate answers. In this paper, we propose a novel completion framework called \textbf{G}enerative \textbf{S}ubgraph-based KGC (GS-KGC), which utilizes subgraph information as contextual reasoning and employs a QA approach to achieve the KGC task. This framework primarily includes a subgraph partitioning algorithm designed to generate negatives and neighbors. Specifically, negatives can encourage LLMs to generate a broader range of answers, while neighbors provide additional contextual insights for LLM reasoning. Furthermore, we found that GS-KGC can discover potential triples within the KGs and new facts beyond the KGs. Experiments conducted on four common KGC datasets highlight the advantages of the proposed GS-KGC, e.g., it shows a 5.6\% increase in Hits@3 compared to the LLM-based model CP-KGC on the FB15k-237N, and a 9.3\% increase over the LLM-based model TECHS on the ICEWS14.

GS-KGC: A Generative Subgraph-based Framework for Knowledge Graph Completion with Large Language Models

TL;DR

This work introduces GS-KGC, a subgraph-aware generative framework for knowledge graph completion that treats the task as QA and uses subgraph-derived negatives and neighbors to guide LLM reasoning. By partitioning subgraphs and applying information-merge strategies, GS-KGC enables direct generation of plausible missing entities, including facts beyond the existing KG. The paper demonstrates strong empirical gains over text-based and LLM-enhanced baselines on SKG and TKGC datasets, with ablations showing negatives as the primary driver of improvement and neighbors providing complementary context. The approach offers scalable, open-world capable KGC with potential for domain-specific pre-training and future extensions.

Abstract

Knowledge graph completion (KGC) focuses on identifying missing triples in a knowledge graph (KG) , which is crucial for many downstream applications. Given the rapid development of large language models (LLMs), some LLM-based methods are proposed for KGC task. However, most of them focus on prompt engineering while overlooking the fact that finer-grained subgraph information can aid LLMs in generating more accurate answers. In this paper, we propose a novel completion framework called \textbf{G}enerative \textbf{S}ubgraph-based KGC (GS-KGC), which utilizes subgraph information as contextual reasoning and employs a QA approach to achieve the KGC task. This framework primarily includes a subgraph partitioning algorithm designed to generate negatives and neighbors. Specifically, negatives can encourage LLMs to generate a broader range of answers, while neighbors provide additional contextual insights for LLM reasoning. Furthermore, we found that GS-KGC can discover potential triples within the KGs and new facts beyond the KGs. Experiments conducted on four common KGC datasets highlight the advantages of the proposed GS-KGC, e.g., it shows a 5.6\% increase in Hits@3 compared to the LLM-based model CP-KGC on the FB15k-237N, and a 9.3\% increase over the LLM-based model TECHS on the ICEWS14.
Paper Structure (22 sections, 17 equations, 7 figures, 5 tables)

This paper contains 22 sections, 17 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Compared with previous KGC methods, generative KGC can discover new facts outside the KG.
  • Figure 2: GS-KGC model architecture.
  • Figure 3: Illustration of negatives and neighbors information for subgraph segmentation. Negatives represent other answers for $(e, r_1)$ within the training set, neighbors represent triples connected to $e$ in the training set where the relationship is not $r_1$, and $x$ represents other answers for $(e, r_1)$ in the test set.
  • Figure 4: The real completion result of LLM in the CWA.
  • Figure 5: The line chart comparison of Hits@1 for different datasets as parameter M varies.
  • ...and 2 more figures