Table of Contents
Fetching ...

Ontology Generation using Large Language Models

Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Sara Zuppiroli, Miguel Ceriani, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese

TL;DR

This work investigates using large language models to draft OWL ontologies from natural-language requirements, introducing two prompting techniques—Memoryless CQbyCQ and Ontogenia—and evaluating them through a multi-dimensional framework. It presents a benchmark dataset of ten ontologies, 100 competency questions, and 29 user stories, and compares several LLMs across independent and incremental generation settings. The results show that OpenAI o1-preview with Ontogenia often yields higher-quality ontologies than baselines and novice human modellers, while also highlighting persistent issues such as superfluous elements and incorrect domain/range axioms. The study demonstrates the feasibility of LLM-assisted ontology drafting and emphasizes the need for comprehensive, human-in-the-loop evaluation and future work to further reduce errors and enhance practical tooling for ontology engineers.

Abstract

The ontology engineering process is complex, time-consuming, and error-prone, even for experienced ontology engineers. In this work, we investigate the potential of Large Language Models (LLMs) to provide effective OWL ontology drafts directly from ontological requirements described using user stories and competency questions. Our main contribution is the presentation and evaluation of two new prompting techniques for automated ontology development: Memoryless CQbyCQ and Ontogenia. We also emphasize the importance of three structural criteria for ontology assessment, alongside expert qualitative evaluation, highlighting the need for a multi-dimensional evaluation in order to capture the quality and usability of the generated ontologies. Our experiments, conducted on a benchmark dataset of ten ontologies with 100 distinct CQs and 29 different user stories, compare the performance of three LLMs using the two prompting techniques. The results demonstrate improvements over the current state-of-the-art in LLM-supported ontology engineering. More specifically, the model OpenAI o1-preview with Ontogenia produces ontologies of sufficient quality to meet the requirements of ontology engineers, significantly outperforming novice ontology engineers in modelling ability. However, we still note some common mistakes and variability of result quality, which is important to take into account when using LLMs for ontology authoring support. We discuss these limitations and propose directions for future research.

Ontology Generation using Large Language Models

TL;DR

This work investigates using large language models to draft OWL ontologies from natural-language requirements, introducing two prompting techniques—Memoryless CQbyCQ and Ontogenia—and evaluating them through a multi-dimensional framework. It presents a benchmark dataset of ten ontologies, 100 competency questions, and 29 user stories, and compares several LLMs across independent and incremental generation settings. The results show that OpenAI o1-preview with Ontogenia often yields higher-quality ontologies than baselines and novice human modellers, while also highlighting persistent issues such as superfluous elements and incorrect domain/range axioms. The study demonstrates the feasibility of LLM-assisted ontology drafting and emphasizes the need for comprehensive, human-in-the-loop evaluation and future work to further reduce errors and enhance practical tooling for ontology engineers.

Abstract

The ontology engineering process is complex, time-consuming, and error-prone, even for experienced ontology engineers. In this work, we investigate the potential of Large Language Models (LLMs) to provide effective OWL ontology drafts directly from ontological requirements described using user stories and competency questions. Our main contribution is the presentation and evaluation of two new prompting techniques for automated ontology development: Memoryless CQbyCQ and Ontogenia. We also emphasize the importance of three structural criteria for ontology assessment, alongside expert qualitative evaluation, highlighting the need for a multi-dimensional evaluation in order to capture the quality and usability of the generated ontologies. Our experiments, conducted on a benchmark dataset of ten ontologies with 100 distinct CQs and 29 different user stories, compare the performance of three LLMs using the two prompting techniques. The results demonstrate improvements over the current state-of-the-art in LLM-supported ontology engineering. More specifically, the model OpenAI o1-preview with Ontogenia produces ontologies of sufficient quality to meet the requirements of ontology engineers, significantly outperforming novice ontology engineers in modelling ability. However, we still note some common mistakes and variability of result quality, which is important to take into account when using LLMs for ontology authoring support. We discuss these limitations and propose directions for future research.

Paper Structure

This paper contains 34 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of Memoryless CQbyCQ (top part) and Ontogenia (bottom part).
  • Figure 2: Illustration of the two ontology generation settings (top) and the four evaluation steps for assessing the generated ontologies (bottom). The top setup generates an ontology concerning only one CQ, which is evaluated individually. The second setup generates an ontology covering multiple CQs associated with one story, which is then evaluated. At the bottom, the four ontology evaluation settings are shown: OOPS!, the proportion of modelled CQs, statistics of superfluous elements and expert evaluation.
  • Figure 3: Scores for "SemanticWebCourse" from the outputs for the different prompting techniques compared with students' submissions according to the proportion of the CQs that were accurately modelled. 'IG' indicates results when minor issues are ignored. Llama$\star$ refers to Llama-3.1-405B-instruct-bf16.
  • Figure 4: Analysis of an ontology for a CQ. Part A shows a correctly modelled CQ, ensuring all necessary elements for a SPARQL query are present, but contain superfluous elements. Part B shows a minor issue where a data property is missing.