KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction
Jack Boylan, Shashank Mangla, Dominic Thorn, Demian Gholipour Ghalandari, Parsa Ghaffari, Chris Hokamp
TL;DR
KG validation for knowledge-graph completion is hampered by open-world incompleteness and annotation costs. The authors propose KGValidator, a framework that uses LLMs with contextual evidence (LLM intrinsic knowledge, textual context, reference KGs like Wikidata, and web search) to validate candidate KG triples without gold references, leveraging Pydantic and the Instructor library for structured outputs. They demonstrate improvements in triple classification accuracy across multiple KG benchmarks and analyze how context variety affects performance, while candidly discussing limitations of current open-source LLMs and the need for broader tool support. The work offers a path toward scalable, context-grounded KG validation with practical implications for maintaining and updating large knowledge bases such as Wikidata.
Abstract
This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.
