Table of Contents
Fetching ...

Assessing LLMs Suitability for Knowledge Graph Completion

Vasile Ionut Remus Iga, Gheorghe Cosmin Silaghi

TL;DR

This work evaluates the suitability of Large Language Models for Knowledge Graph Completion in static knowledge graphs embedded within Task-Oriented Dialogue systems. It compares Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125, and GPT-4o using TELeR-based prompting across Zero- and One-Shot settings on two domain-tailored datasets, with both strict and flexible evaluation metrics. The contributions include two personalized KGC datasets, a structured prompting framework, and a flexible post-processing metric scheme, revealing that GPT-4o is the most reliable across settings while Mixtral struggles with strict formats. The findings inform practical integration of KG completion in ontology-enhanced TOD systems and point toward retrieval-augmented generation and careful prompting as promising directions for robust, cost-aware deployment.

Abstract

Recent work has shown the capability of Large Language Models (LLMs) to solve tasks related to Knowledge Graphs, such as Knowledge Graph Completion, even in Zero- or Few-Shot paradigms. However, they are known to hallucinate answers, or output results in a non-deterministic manner, thus leading to wrongly reasoned responses, even if they satisfy the user's demands. To highlight opportunities and challenges in knowledge graphs-related tasks, we experiment with three distinguished LLMs, namely Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125 and GPT-4o, on Knowledge Graph Completion for static knowledge graphs, using prompts constructed following the TELeR taxonomy, in Zero- and One-Shot contexts, on a Task-Oriented Dialogue system use case. When evaluated using both strict and flexible metrics measurement manners, our results show that LLMs could be fit for such a task if prompts encapsulate sufficient information and relevant examples.

Assessing LLMs Suitability for Knowledge Graph Completion

TL;DR

This work evaluates the suitability of Large Language Models for Knowledge Graph Completion in static knowledge graphs embedded within Task-Oriented Dialogue systems. It compares Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125, and GPT-4o using TELeR-based prompting across Zero- and One-Shot settings on two domain-tailored datasets, with both strict and flexible evaluation metrics. The contributions include two personalized KGC datasets, a structured prompting framework, and a flexible post-processing metric scheme, revealing that GPT-4o is the most reliable across settings while Mixtral struggles with strict formats. The findings inform practical integration of KG completion in ontology-enhanced TOD systems and point toward retrieval-augmented generation and careful prompting as promising directions for robust, cost-aware deployment.

Abstract

Recent work has shown the capability of Large Language Models (LLMs) to solve tasks related to Knowledge Graphs, such as Knowledge Graph Completion, even in Zero- or Few-Shot paradigms. However, they are known to hallucinate answers, or output results in a non-deterministic manner, thus leading to wrongly reasoned responses, even if they satisfy the user's demands. To highlight opportunities and challenges in knowledge graphs-related tasks, we experiment with three distinguished LLMs, namely Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125 and GPT-4o, on Knowledge Graph Completion for static knowledge graphs, using prompts constructed following the TELeR taxonomy, in Zero- and One-Shot contexts, on a Task-Oriented Dialogue system use case. When evaluated using both strict and flexible metrics measurement manners, our results show that LLMs could be fit for such a task if prompts encapsulate sufficient information and relevant examples.
Paper Structure (8 sections, 3 figures, 6 tables)

This paper contains 8 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The ontology used throughout the experiments with three classes and six relationships
  • Figure 2: Example of a dictionary object with its text-related details.
  • Figure 3: The level 1 system prompt.