Complex Ontology Matching with Large Language Model Embeddings
Guilherme Sousa, Rinaldo Lima, Cassia Trojahn
TL;DR
This work addresses the expressive matching gap in ontology and knowledge graph alignment by integrating large language model embeddings into a CANARD-based, SPARQL-guided framework. It introduces four embedding-based modifications—Label embedding similarity, Embeddings of SPARQL query, Subgraph embeddings, and Instance embeddings—to enhance how surrounding subgraphs are matched, with pre-trained models and no additional training. Through experiments on the populated OAEI Conference benchmark, the approach achieves superior precision and F-measure compared to the baseline and several state-of-the-art systems, while also offering insights into the impact of each modification. The method’s reliance on user-provided SPARQL needs and pre-trained embeddings makes it broadly applicable and scalable for complex matching tasks, with clear directions for future enhancements such as pure T-Box strategies and ontology partitioning.
Abstract
Ontology, and more broadly, Knowledge Graph Matching is a challenging task in which expressiveness has not been fully addressed. Despite the increasing use of embeddings and language models for this task, approaches for generating expressive correspondences still do not take full advantage of these models, in particular, large language models (LLMs). This paper proposes to integrate LLMs into an approach for generating expressive correspondences based on alignment need and ABox-based relation discovery. The generation of correspondences is performed by matching similar surroundings of instance sub-graphs. The integration of LLMs results in different architectural modifications, including label similarity, sub-graph matching, and entity matching. The performance word embeddings, sentence embeddings, and LLM-based embeddings, was compared. The results demonstrate that integrating LLMs surpasses all other models, enhancing the baseline version of the approach with a 45\% increase in F-measure.
