Are Large Language Models Effective Knowledge Graph Constructors?
Ruirui Chen, Weifeng Jiang, Chengwei Qin, Bo Xiong, Fiona Liausvia, Dongkyu Choi, Boon Kiat Quek
TL;DR
This paper tackles knowledge graph construction from text using large language models by proposing a hierarchical extraction framework that progresses from initial extraction to splitting and abstraction, while integrating coreference resolution, entity deduplication, and source tracing. It evaluates six diverse LLMs in a zero-shot setting on 3,216 sentences from papers about children's mental well-being, using both structural (Fraction in Giant Component, $F_{GC} = \frac{|C_{max}|}{|V|}$) and semantic (GPT-4.1-based) judgments across stages. The authors demonstrate that the hierarchical approach improves graph connectivity and semantic coherence, though model-specific trade-offs exist in accuracy, coverage, and computation time, and they emphasize the necessity of human validation for gold-standard KG quality. A released dataset of LLM-generated KGs aims to promote transparent, reliable applications in healthcare and other high-stakes domains, while highlighting open challenges in long-text prompting, evaluation, and downstream task integration.
Abstract
Knowledge graphs (KGs) are vital for knowledge-intensive tasks and have shown promise in reducing hallucinations in large language models (LLMs). However, constructing high-quality KGs remains difficult, requiring accurate information extraction and structured representations that support interpretability and downstream utility. Existing LLM-based approaches often focus narrowly on entity and relation extraction, limiting coverage to sentence-level contexts or relying on predefined schemas. We propose a hierarchical extraction framework that organizes information at multiple levels, enabling the creation of semantically rich and well-structured KGs. Using state-of-the-art LLMs, we extract and construct knowledge graphs and evaluate them comprehensively from both structural and semantic perspectives. Our results highlight the strengths and shortcomings of current LLMs in KG construction and identify key challenges for future work. To advance research in this area, we also release a curated dataset of LLM-generated KGs derived from research papers on children's mental well-being. This resource aims to foster more transparent, reliable, and impactful applications in high-stakes domains such as healthcare.
