LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, Ningyu Zhang
TL;DR
The paper systematically evaluates GPT-4 and related LLMs on KG construction and reasoning across eight datasets, revealing that while LLMs are not yet best-in-class extractors, they excel as inference assistants and show strong reasoning capabilities that can surpass fine-tuned models in some cases. It introduces Virtual Knowledge Extraction (VINE) to probe generalization beyond memorized knowledge and demonstrates GPT-4’s notable ability to acquire new extraction skills from instructions. To operationalize these insights, the authors propose AutoKG, a multi-agent framework that couples LLMs with external sources for collaborative KG construction and reasoning. Together, these contributions offer a roadmap for leveraging LLMs in KG pipelines, balancing prompt design, external retrieval, and human-in-the-loop validation to improve accuracy and scalability. The work highlights both the promise and current limitations of LLM-driven KG work and points to future directions in automatic, interactive KG systems and multimodal reasoning.
Abstract
This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs. The code and datasets are in https://github.com/zjunlp/AutoKG.
