LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Yuqi Zhu; Xiaohan Wang; Jing Chen; Shuofei Qiao; Yixin Ou; Yunzhi Yao; Shumin Deng; Huajun Chen; Ningyu Zhang

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, Ningyu Zhang

TL;DR

The paper systematically evaluates GPT-4 and related LLMs on KG construction and reasoning across eight datasets, revealing that while LLMs are not yet best-in-class extractors, they excel as inference assistants and show strong reasoning capabilities that can surpass fine-tuned models in some cases. It introduces Virtual Knowledge Extraction (VINE) to probe generalization beyond memorized knowledge and demonstrates GPT-4’s notable ability to acquire new extraction skills from instructions. To operationalize these insights, the authors propose AutoKG, a multi-agent framework that couples LLMs with external sources for collaborative KG construction and reasoning. Together, these contributions offer a roadmap for leveraging LLMs in KG pipelines, balancing prompt design, external retrieval, and human-in-the-loop validation to improve accuracy and scalability. The work highlights both the promise and current limitations of LLM-driven KG work and points to future directions in automatic, interactive KG systems and multimodal reasoning.

Abstract

This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs. The code and datasets are in https://github.com/zjunlp/AutoKG.

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

TL;DR

Abstract

Paper Structure (22 sections, 5 figures, 4 tables)

This paper contains 22 sections, 5 figures, 4 tables.

Introduction
Recent Capabilities of LLMs for KG Construction and Reasoning
Evaluation Principle
KG Construction and Reasoning
Settings
Overall Results
KG Construction vs. Reasoning
General vs. Specific Domain
Discussion: Why LLMs do not present satisfactory performance on some tasks?
Discussion: Do LLMs have memorized knowledge or truly have the generalization ability?
Data Collection
Preliminary Results
Future Opportunities: Automatic KG Construction and Reasoning
Conclusion and Future Work
Related Work
...and 7 more sections

Figures (5)

Figure 1: The overview of our work. There are three main components: 1) Basic Evaluation: detailing our assessment of large models (text-davinci-003, ChatGPT, and GPT-4), in both zero-shot and one-shot settings, using performance from fully supervised state-of-the-art models as benchmarks; 2) Virtual Knowledge Extraction: an examination of LLMs' virtual knowledge capabilities on the constructed VINE dataset; and 3) Automatic KG: the proposal of utilizing multiple agents to facilitate the construction and reasoning of KGs.
Figure 2: Examples of ChatGPT and GPT-4 on the RE datasets. (1) Zero-shot on the SciERC dataset (2) Zero-shot on the Re-TACRED dataset (3) One-shot on the DuIE2.0 dataset
Figure 3: Here are examples of task Event Extraction, Link Prediction and Question Answering.
Figure 4: Prompts used in Virtual Knowledge Extraction. The blue box is the demonstration and the pink box is the corresponding answer.
Figure 5: Illustration of AutoKG, that integrates KG construction and reasoning by employing GPT-4 and communicative agents based on ChatGPT. The figure omits the specific operational process, providing the results directly.

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

TL;DR

Abstract

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Authors

TL;DR

Abstract

Table of Contents

Figures (5)