Table of Contents
Fetching ...

Large language models as oracles for instantiating ontologies with domain-specific knowledge

Giovanni Ciatto, Andrea Agiollo, Matteo Magnini, Andrea Omicini

TL;DR

Ontology population is traditionally manual or data-driven, often biased or data-dependent. The paper introduces KGFiller, a domain-independent pipeline that uses Large Language Models as oracles, starting from an initial ontology schema and query templates, to automatically generate instances, relations, and refined classifications through four phases (population, relation, redistribution, merge). A Python implementation populates a nutrition ontology and is evaluated across eight LLM families, with a quality metric defined as $Q = \dfrac{TI-TE + TR - E_{wr}}{TI + TR}$; results show high reliability in many runs (e.g., $Q$ up to about 0.91–0.93 for GPT-3.5/GPT-4 Turbo) and substantial reductions in errors compared to a state-of-the-art baseline. The work analyzes error types, QoS, and model trade-offs, demonstrating that larger oracles generally yield larger yet more accurate ontologies, while also highlighting hallucinations and potential biases. Overall, KGFiller offers a scalable, incremental, and general approach to automate ontology population, with practical impact for rapid domain knowledge instantiation under budget and reliability considerations.

Abstract

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes and properties and (ii) a set of query templates, our method queries the LLM multiple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Experimentally, our approach achieves a quality metric that is up to five times higher than the state-of-the-art, while reducing erroneous entities and relations by up to ten times. Finally, we provide a SWOT analysis of the proposed method.

Large language models as oracles for instantiating ontologies with domain-specific knowledge

TL;DR

Ontology population is traditionally manual or data-driven, often biased or data-dependent. The paper introduces KGFiller, a domain-independent pipeline that uses Large Language Models as oracles, starting from an initial ontology schema and query templates, to automatically generate instances, relations, and refined classifications through four phases (population, relation, redistribution, merge). A Python implementation populates a nutrition ontology and is evaluated across eight LLM families, with a quality metric defined as ; results show high reliability in many runs (e.g., up to about 0.91–0.93 for GPT-3.5/GPT-4 Turbo) and substantial reductions in errors compared to a state-of-the-art baseline. The work analyzes error types, QoS, and model trade-offs, demonstrating that larger oracles generally yield larger yet more accurate ontologies, while also highlighting hallucinations and potential biases. Overall, KGFiller offers a scalable, incremental, and general approach to automate ontology population, with practical impact for rapid domain knowledge instantiation under budget and reliability considerations.

Abstract

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes and properties and (ii) a set of query templates, our method queries the LLM multiple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Experimentally, our approach achieves a quality metric that is up to five times higher than the state-of-the-art, while reducing erroneous entities and relations by up to ten times. Finally, we provide a SWOT analysis of the proposed method.
Paper Structure (87 sections, 2 equations, 5 figures, 5 tables, 5 algorithms)

This paper contains 87 sections, 2 equations, 5 figures, 5 tables, 5 algorithms.

Figures (5)

  • Figure 1: Conceptualization of LLM services
  • Figure 2: Overview of KGFiller, based on a running example. The example assumes that the ontology to be populated is about animals, and it includes the classes $\mathit{Cat}, \mathit{Dog}, \mathit{Mouse} \sqsubset \mathit{Animal} \sqsubset \mathit{Thing} \equiv \top$ -- none of which has any instance yet --, as well as a property $\mathsf{chasedBy} : \mathit{Animal} \times \top$, stating that each animal may be chased by some other entity. The initial state of the input ontology is depicted in the top-left box: classes are represented as yellow boxes, property definitions as dashed edges with black arrows, while subsumption relations among classes as straight edges with big white arrows. The KGFiller algorithm will then encompass four phases, each one depicted in a separate box: in each box, differences with respect to the previous state are highlighted in blue. The bottom-left box represents the outcome of the first phase, namely the population phase, the LLM is queried to generate instances for the classes in the ontology. Instances are represented as green ellipses, whereas the relations between instances and classes are depicted as straight edges with white diamonds. Accordingly, in this phase, we let the LLM return animals from old cartoons such as: $\mathtt{tom}$ (which is a cat), $\mathtt{jerry}$ (which is a mouse), $\mathtt{pluto}$ (which is a dog), and $\mathtt{bugs}$ and $\mathtt{bugs\_bunny}$ ---i.e., two different names referencing the same entity (which is a rabbit). Rabbits get assigned to the $\mathit{Animal}$ class, as no better class is available in the ontology. In the next phase (bottom-middle box), the relation phase, the LLM is queried to generate relations between instances, w.r.t. the properties in the ontology. The relations $\mathsf{chasedBy}(\mathtt{jerry}, \mathtt{tom})$ and $\mathsf{chasedBy}(\mathtt{tom}, \mathtt{spike})$, where $\mathtt{spike}$ novel instance of a dog, generated on the fly as in instance of $\top$ ---as $\top$ class is the range of $\mathsf{chasedBy}$. Such a new entry is eventually moved into the $\mathit{Dog}$ class during the redistribution phase (bottom-right box), where the LLM is queried to redistribute instances among sub-classes. In the process, $\mathtt{spike}$ is moved from $\top$ to $\mathit{Animal}$ and then to $\mathit{Dog}$, where it belongs. Finally, in the merging phase (top-right box), the LLM is queried to merge instances that are syntactically similar. This is the case, for instance, of $\mathtt{bugs}$ and $\mathtt{bugs_bunny}$, which are merged into a single entity. This instance should remain in the $\mathit{Animal}$ class, as no other class in the ontology is more specific than $\mathit{Animal}$ for this particular individual.
  • Figure 3: Class hierarchy of the case study ontology. Notice that the hierarchy is not really a tree, but rather a DAG. The asterisk (*) denotes classes having multiple super-classes (they are depicted once per super-class for the sake of readability).
  • Figure 4: Comparison of the performance of Harvest HaoTTNSZXH23 and KGFiller w.r.t. the task of populating our food ontology (cf. \ref{['ssec:ontology']}). For KGFiller, the best-performing closed-source and open-source LLM models are considered---respectively GPT 3.5 Turbo and Nous Hermes.
  • Figure 5: Parse tree for the response in \ref{['lst:fake-cat-names']}, parsed according to the grammar in \ref{['response-grammar']}. Colours highlight relevant sub-trees of the parse tree. Blue and cyan (alternating colours) ellipses highlights relevant names output by the ExtractNames function when fed with that response.