Table of Contents
Fetching ...

Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models

Bradley P. Allen, Paul T. Groth

TL;DR

The paper addresses quality assessment of class membership relations in knowledge graphs by introducing a zero-shot chain-of-thought classifier that uses intensional natural-language class definitions. It formalizes a neurosymbolic workflow that integrates KGs with LLMs, and evaluates this approach on Wikidata and CaLiGraph using seven LLMs, showing macro-F1 scores up to 0.830–0.893 depending on the dataset and model. Error analysis reveals that a substantial portion of misclassifications come from KG issues (missing or incorrect relations) and from incomplete entity descriptions, supporting the claim that LLMs can aid KG refinement. The work demonstrates a practical method for interactive KG quality assurance and provides code and data to enable replication and further exploration.

Abstract

A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method's classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github.

Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models

TL;DR

The paper addresses quality assessment of class membership relations in knowledge graphs by introducing a zero-shot chain-of-thought classifier that uses intensional natural-language class definitions. It formalizes a neurosymbolic workflow that integrates KGs with LLMs, and evaluates this approach on Wikidata and CaLiGraph using seven LLMs, showing macro-F1 scores up to 0.830–0.893 depending on the dataset and model. Error analysis reveals that a substantial portion of misclassifications come from KG issues (missing or incorrect relations) and from incomplete entity descriptions, supporting the claim that LLMs can aid KG refinement. The work demonstrates a practical method for interactive KG quality assurance and provides code and data to enable replication and further exploration.

Abstract

A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method's classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github.
Paper Structure (6 sections, 2 tables)

This paper contains 6 sections, 2 tables.