Complex Ontology Matching with Large Language Model Embeddings

Guilherme Sousa; Rinaldo Lima; Cassia Trojahn

Complex Ontology Matching with Large Language Model Embeddings

Guilherme Sousa, Rinaldo Lima, Cassia Trojahn

TL;DR

This work addresses the expressive matching gap in ontology and knowledge graph alignment by integrating large language model embeddings into a CANARD-based, SPARQL-guided framework. It introduces four embedding-based modifications—Label embedding similarity, Embeddings of SPARQL query, Subgraph embeddings, and Instance embeddings—to enhance how surrounding subgraphs are matched, with pre-trained models and no additional training. Through experiments on the populated OAEI Conference benchmark, the approach achieves superior precision and F-measure compared to the baseline and several state-of-the-art systems, while also offering insights into the impact of each modification. The method’s reliance on user-provided SPARQL needs and pre-trained embeddings makes it broadly applicable and scalable for complex matching tasks, with clear directions for future enhancements such as pure T-Box strategies and ontology partitioning.

Abstract

Ontology, and more broadly, Knowledge Graph Matching is a challenging task in which expressiveness has not been fully addressed. Despite the increasing use of embeddings and language models for this task, approaches for generating expressive correspondences still do not take full advantage of these models, in particular, large language models (LLMs). This paper proposes to integrate LLMs into an approach for generating expressive correspondences based on alignment need and ABox-based relation discovery. The generation of correspondences is performed by matching similar surroundings of instance sub-graphs. The integration of LLMs results in different architectural modifications, including label similarity, sub-graph matching, and entity matching. The performance word embeddings, sentence embeddings, and LLM-based embeddings, was compared. The results demonstrate that integrating LLMs surpasses all other models, enhancing the baseline version of the approach with a 45\% increase in F-measure.

Complex Ontology Matching with Large Language Model Embeddings

TL;DR

Abstract

Complex Ontology Matching with Large Language Model Embeddings

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)