GS-KGC: A Generative Subgraph-based Framework for Knowledge Graph Completion with Large Language Models
Rui Yang, Jiahao Zhu, Jianping Man, Hongze Liu, Li Fang, Yi Zhou
TL;DR
This work introduces GS-KGC, a subgraph-aware generative framework for knowledge graph completion that treats the task as QA and uses subgraph-derived negatives and neighbors to guide LLM reasoning. By partitioning subgraphs and applying information-merge strategies, GS-KGC enables direct generation of plausible missing entities, including facts beyond the existing KG. The paper demonstrates strong empirical gains over text-based and LLM-enhanced baselines on SKG and TKGC datasets, with ablations showing negatives as the primary driver of improvement and neighbors providing complementary context. The approach offers scalable, open-world capable KGC with potential for domain-specific pre-training and future extensions.
Abstract
Knowledge graph completion (KGC) focuses on identifying missing triples in a knowledge graph (KG) , which is crucial for many downstream applications. Given the rapid development of large language models (LLMs), some LLM-based methods are proposed for KGC task. However, most of them focus on prompt engineering while overlooking the fact that finer-grained subgraph information can aid LLMs in generating more accurate answers. In this paper, we propose a novel completion framework called \textbf{G}enerative \textbf{S}ubgraph-based KGC (GS-KGC), which utilizes subgraph information as contextual reasoning and employs a QA approach to achieve the KGC task. This framework primarily includes a subgraph partitioning algorithm designed to generate negatives and neighbors. Specifically, negatives can encourage LLMs to generate a broader range of answers, while neighbors provide additional contextual insights for LLM reasoning. Furthermore, we found that GS-KGC can discover potential triples within the KGs and new facts beyond the KGs. Experiments conducted on four common KGC datasets highlight the advantages of the proposed GS-KGC, e.g., it shows a 5.6\% increase in Hits@3 compared to the LLM-based model CP-KGC on the FB15k-237N, and a 9.3\% increase over the LLM-based model TECHS on the ICEWS14.
