Table of Contents
Fetching ...

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

Jack Boylan, Shashank Mangla, Dominic Thorn, Demian Gholipour Ghalandari, Parsa Ghaffari, Chris Hokamp

TL;DR

KG validation for knowledge-graph completion is hampered by open-world incompleteness and annotation costs. The authors propose KGValidator, a framework that uses LLMs with contextual evidence (LLM intrinsic knowledge, textual context, reference KGs like Wikidata, and web search) to validate candidate KG triples without gold references, leveraging Pydantic and the Instructor library for structured outputs. They demonstrate improvements in triple classification accuracy across multiple KG benchmarks and analyze how context variety affects performance, while candidly discussing limitations of current open-source LLMs and the need for broader tool support. The work offers a path toward scalable, context-grounded KG validation with practical implications for maintaining and updating large knowledge bases such as Wikidata.

Abstract

This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

TL;DR

KG validation for knowledge-graph completion is hampered by open-world incompleteness and annotation costs. The authors propose KGValidator, a framework that uses LLMs with contextual evidence (LLM intrinsic knowledge, textual context, reference KGs like Wikidata, and web search) to validate candidate KG triples without gold references, leveraging Pydantic and the Instructor library for structured outputs. They demonstrate improvements in triple classification accuracy across multiple KG benchmarks and analyze how context variety affects performance, while candidly discussing limitations of current open-source LLMs and the need for broader tool support. The work offers a path toward scalable, context-grounded KG validation with practical implications for maintaining and updating large knowledge bases such as Wikidata.

Abstract

This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.
Paper Structure (43 sections, 12 figures, 4 tables)

This paper contains 43 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Framework for Validating Knowledge Graph Triples.
  • Figure 2: An example of the Closed-World Assumption in KG completion. Some of the triples predicted by a KG completion model are true in the real world (e.g. books written by James Joyce) but missing in the test set and would therefore be treated as false positives.
  • Figure 3: An example of Open Information Extraction. Note that in OpenIE, the output schema is not fixed.
  • Figure 4: Validating KGs with LLM Knowledge
  • Figure 5: Validating KGs given Textual Context
  • ...and 7 more figures