Table of Contents
Fetching ...

Ontological Relations from Word Embeddings

Mathieu d'Aquin, Emmanuel Nauer

TL;DR

This work questions whether word embeddings capture ontological relations beyond semantic similarity and tests this by building embeddings from short entity names and comments across five web ontologies. A simple feed-forward head on top of embeddings from four pre-trained transformers is trained to predict 20 direct or inferred relations, using datasets constructed from ontology graphs and RDF entailment rules. The study finds that embeddings indeed encode ontological information, with Llama2-based embeddings delivering the strongest performance, though results vary by ontology quality and size; cross-ontology generalization is limited, and a global model offers only modest gains. The findings suggest a path toward integrating ontological knowledge into neural systems and enhancing ontology matching and evolution, by leveraging large, diverse web-ontology-derived embeddings as a knowledge substrate.

Abstract

It has been reliably shown that the similarity of word embeddings obtained from popular neural models such as BERT approximates effectively a form of semantic similarity of the meaning of those words. It is therefore natural to wonder if those embeddings contain enough information to be able to connect those meanings through ontological relationships such as the one of subsumption. If so, large knowledge models could be built that are capable of semantically relating terms based on the information encapsulated in word embeddings produced by pre-trained models, with implications not only for ontologies (ontology matching, ontology evolution, etc.) but also on the ability to integrate ontological knowledge in neural models. In this paper, we test how embeddings produced by several pre-trained models can be used to predict relations existing between classes and properties of popular upper-level and general ontologies. We show that even a simple feed-forward architecture on top of those embeddings can achieve promising accuracies, with varying generalisation abilities depending on the input data. To achieve that, we produce a dataset that can be used to further enhance those models, opening new possibilities for applications integrating knowledge from web ontologies.

Ontological Relations from Word Embeddings

TL;DR

This work questions whether word embeddings capture ontological relations beyond semantic similarity and tests this by building embeddings from short entity names and comments across five web ontologies. A simple feed-forward head on top of embeddings from four pre-trained transformers is trained to predict 20 direct or inferred relations, using datasets constructed from ontology graphs and RDF entailment rules. The study finds that embeddings indeed encode ontological information, with Llama2-based embeddings delivering the strongest performance, though results vary by ontology quality and size; cross-ontology generalization is limited, and a global model offers only modest gains. The findings suggest a path toward integrating ontological knowledge into neural systems and enhancing ontology matching and evolution, by leveraging large, diverse web-ontology-derived embeddings as a knowledge substrate.

Abstract

It has been reliably shown that the similarity of word embeddings obtained from popular neural models such as BERT approximates effectively a form of semantic similarity of the meaning of those words. It is therefore natural to wonder if those embeddings contain enough information to be able to connect those meanings through ontological relationships such as the one of subsumption. If so, large knowledge models could be built that are capable of semantically relating terms based on the information encapsulated in word embeddings produced by pre-trained models, with implications not only for ontologies (ontology matching, ontology evolution, etc.) but also on the ability to integrate ontological knowledge in neural models. In this paper, we test how embeddings produced by several pre-trained models can be used to predict relations existing between classes and properties of popular upper-level and general ontologies. We show that even a simple feed-forward architecture on top of those embeddings can achieve promising accuracies, with varying generalisation abilities depending on the input data. To achieve that, we produce a dataset that can be used to further enhance those models, opening new possibilities for applications integrating knowledge from web ontologies.
Paper Structure (19 sections, 2 figures, 4 tables)

This paper contains 19 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the architecture of the model predicting ontological relations (among 20) for two entities from the word embeddings of their short names and comments.
  • Figure 2: Precision, recall, and F-score (in %) of testing the Llama2-based models for each ontology (lines) on the validation sets of each of the ontologies (columns).