Construction of Knowledge Graphs: State and Challenges
Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, Erhard Rahm
TL;DR
This paper surveys the state of knowledge graph construction with a focus on incremental and continuous updates. It defines four key KG construction requirements—input handling, incremental processing, tooling, and quality assurance—and details the tasks needed to build high-quality KGs, including metadata management and ontology management. The authors evaluate 23 KG-specific construction approaches and 7 toolsets against these requirements, highlighting limited support for incremental updates and open-source tooling. They conclude with open challenges and a call for modular, open-source pipelines and robust benchmarks to enable scalable, high-quality KG construction across domains.
Abstract
With knowledge graphs (KGs) at the center of numerous applications such as recommender systems and question answering, the need for generalized pipelines to construct and continuously update such KGs is increasing. While the individual steps that are necessary to create KGs from unstructured (e.g. text) and structured data sources (e.g. databases) are mostly well-researched for their one-shot execution, their adoption for incremental KG updates and the interplay of the individual steps have hardly been investigated in a systematic manner so far. In this work, we first discuss the main graph models for KGs and introduce the major requirement for future KG construction pipelines. Next, we provide an overview of the necessary steps to build high-quality KGs, including cross-cutting topics such as metadata management, ontology development, and quality assurance. We then evaluate the state of the art of KG construction w.r.t the introduced requirements for specific popular KGs as well as some recent tools and strategies for KG construction. Finally, we identify areas in need of further research and improvement.
