Table of Contents
Fetching ...

Enhancing Knowledge Graph Construction Using Large Language Models

Milena Trajanoska, Riste Stojanov, Dimitar Trajanov

TL;DR

The paper tackles the challenge of integrating large language models with semantic technologies to construct Knowledge Graphs from unstructured text. It compares REBEL and ChatGPT on a sustainability use case to extract entities and relations and to explore ontology generation. The findings show that while REBEL provides structured triplets, carefully prompted ChatGPT can generate higher-quality ontologies and instance data, improving KG usefulness. The work demonstrates a viable pathway for automatic KG construction from web data and highlights directions for formal evaluation and cross-domain generalization.

Abstract

The growing trend of Large Language Models (LLM) development has attracted significant attention, with models for various applications emerging consistently. However, the combined application of Large Language Models with semantic technologies for reasoning and inference is still a challenging task. This paper analyzes how the current advances in foundational LLM, like ChatGPT, can be compared with the specialized pretrained models, like REBEL, for joint entity and relation extraction. To evaluate this approach, we conducted several experiments using sustainability-related text as our use case. We created pipelines for the automatic creation of Knowledge Graphs from raw texts, and our findings indicate that using advanced LLM models can improve the accuracy of the process of creating these graphs from unstructured text. Furthermore, we explored the potential of automatic ontology creation using foundation LLM models, which resulted in even more relevant and accurate knowledge graphs.

Enhancing Knowledge Graph Construction Using Large Language Models

TL;DR

The paper tackles the challenge of integrating large language models with semantic technologies to construct Knowledge Graphs from unstructured text. It compares REBEL and ChatGPT on a sustainability use case to extract entities and relations and to explore ontology generation. The findings show that while REBEL provides structured triplets, carefully prompted ChatGPT can generate higher-quality ontologies and instance data, improving KG usefulness. The work demonstrates a viable pathway for automatic KG construction from web data and highlights directions for formal evaluation and cross-domain generalization.

Abstract

The growing trend of Large Language Models (LLM) development has attracted significant attention, with models for various applications emerging consistently. However, the combined application of Large Language Models with semantic technologies for reasoning and inference is still a challenging task. This paper analyzes how the current advances in foundational LLM, like ChatGPT, can be compared with the specialized pretrained models, like REBEL, for joint entity and relation extraction. To evaluate this approach, we conducted several experiments using sustainability-related text as our use case. We created pipelines for the automatic creation of Knowledge Graphs from raw texts, and our findings indicate that using advanced LLM models can improve the accuracy of the process of creating these graphs from unstructured text. Furthermore, we explored the potential of automatic ontology creation using foundation LLM models, which resulted in even more relevant and accurate knowledge graphs.
Paper Structure (14 sections, 4 figures, 1 table)

This paper contains 14 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Subset of the Knowledge Base generated using the REBEL model. The Knowledge Base is displayed in a graph format where entities are represented as nodes and relations are represented as edges.
  • Figure 2: Subset of the Knowledge Base generated using the first experiment with ChatGPT. The Knowledge Base is displayed in a graph format where entities are represented as nodes and relations are represented as edges.
  • Figure 3: Knowledge Base generated with ChatGPT for the first article. The identified concepts are represented as yellow rectangles, and the instances are represented with green rectangles.
  • Figure 4: Knowledge Base generated with ChatGPT for the second article. The identified concepts are represented as yellow rectangles, and the instances are represented with green rectangles.