Table of Contents
Fetching ...

The Mercurial Top-Level Ontology of Large Language Models

Nele Köhler, Fabian Neuhaus

TL;DR

This paper investigates implicit ontological commitments in the outputs of large language models, using ChatGPT 3.5 as a case study. It defines ontology as a theory that captures a text's ontological commitments and derives a top-level ontology (GPT's TLO) from systematic prompts and transcript analysis, acknowledging the methodological challenges posed by stochastic generation. The authors present a hierarchical taxonomy and an OWL representation, discuss fundamental assumptions (e.g., mereology, endurantism vs perdurantism, materialism), and compare GPT's TLO to established frameworks like BFO and DOLCE, highlighting both similarities and significant differences. They conclude that while LLMs offer a stable core of ontological commitments, their mercurial, non-axiomatic nature introduces ontological overload and inconsistencies, necessitating careful cross-prompt validation and integration strategies for ontology engineering.

Abstract

In our work, we systematize and analyze implicit ontological commitments in the responses generated by large language models (LLMs), focusing on ChatGPT 3.5 as a case study. We investigate how LLMs, despite having no explicit ontology, exhibit implicit ontological categorizations that are reflected in the texts they generate. The paper proposes an approach to understanding the ontological commitments of LLMs by defining ontology as a theory that provides a systematic account of the ontological commitments of some text. We investigate the ontological assumptions of ChatGPT and present a systematized account, i.e., GPT's top-level ontology. This includes a taxonomy, which is available as an OWL file, as well as a discussion about ontological assumptions (e.g., about its mereology or presentism). We show that in some aspects GPT's top-level ontology is quite similar to existing top-level ontologies. However, there are significant challenges arising from the flexible nature of LLM-generated texts, including ontological overload, ambiguity, and inconsistency.

The Mercurial Top-Level Ontology of Large Language Models

TL;DR

This paper investigates implicit ontological commitments in the outputs of large language models, using ChatGPT 3.5 as a case study. It defines ontology as a theory that captures a text's ontological commitments and derives a top-level ontology (GPT's TLO) from systematic prompts and transcript analysis, acknowledging the methodological challenges posed by stochastic generation. The authors present a hierarchical taxonomy and an OWL representation, discuss fundamental assumptions (e.g., mereology, endurantism vs perdurantism, materialism), and compare GPT's TLO to established frameworks like BFO and DOLCE, highlighting both similarities and significant differences. They conclude that while LLMs offer a stable core of ontological commitments, their mercurial, non-axiomatic nature introduces ontological overload and inconsistencies, necessitating careful cross-prompt validation and integration strategies for ontology engineering.

Abstract

In our work, we systematize and analyze implicit ontological commitments in the responses generated by large language models (LLMs), focusing on ChatGPT 3.5 as a case study. We investigate how LLMs, despite having no explicit ontology, exhibit implicit ontological categorizations that are reflected in the texts they generate. The paper proposes an approach to understanding the ontological commitments of LLMs by defining ontology as a theory that provides a systematic account of the ontological commitments of some text. We investigate the ontological assumptions of ChatGPT and present a systematized account, i.e., GPT's top-level ontology. This includes a taxonomy, which is available as an OWL file, as well as a discussion about ontological assumptions (e.g., about its mereology or presentism). We show that in some aspects GPT's top-level ontology is quite similar to existing top-level ontologies. However, there are significant challenges arising from the flexible nature of LLM-generated texts, including ontological overload, ambiguity, and inconsistency.
Paper Structure (8 sections, 4 figures)

This paper contains 8 sections, 4 figures.

Figures (4)

  • Figure 1: ChatGTP uses categories like 'living organism' and 'inanimate object'.
  • Figure 2: Hierarchy of ChatGPT
  • Figure 3: Classification of a shadow as a concrete entity
  • Figure 4: Classification of a shadow distinct from a concrete entity .

Theorems & Definitions (1)

  • Definition 1