Table of Contents
Fetching ...

GenOM: Ontology Matching with Description Generation and Large Language Model

Yiping Song, Jiaoyan Chen, Renate A. Schmidt

TL;DR

GenOM presents an LLM-enhanced ontology matching framework that semantically enriches concepts via generated definitions, uses embedding-based candidate retrieval, and applies LLM-driven binary judgments to determine equivalence, followed by post-processing with exact matching. Evaluated on five biomedical ontologies from the OAEI Bio-ML track, GenOM demonstrates competitive performance, with larger models like Qwen32B delivering stronger, more threshold-robust results and ablations confirming the value of semantic enrichment and few-shot prompting. The work contributes: (1) a modular GenOM pipeline, (2) a novel evaluation framework for LLM-generated definitions, (3) a cross-model analysis of LLM scales, and (4) evidence that semantic enrichment improves both candidate retrieval and final judgments, outperforming many traditional and recent LLM-based baselines. The findings suggest that LLM-driven semantic enrichment can significantly enhance biomedical ontology alignment and offer practical robustness for real-world semantic interoperability, while outlining directions for dynamic thresholds and expanded alignment types in future work.

Abstract

Ontology matching (OM) plays an essential role in enabling semantic interoperability and integration across heterogeneous knowledge sources, particularly in the biomedical domain which contains numerous complex concepts related to diseases and pharmaceuticals. This paper introduces GenOM, a large language model (LLM)-based ontology alignment framework, which enriches the semantic representations of ontology concepts via generating textual definitions, retrieves alignment candidates with an embedding model, and incorporates exact matching-based tools to improve precision. Extensive experiments conducted on the OAEI Bio-ML track demonstrate that GenOM can often achieve competitive performance, surpassing many baselines including traditional OM systems and recent LLM-based methods. Further ablation studies confirm the effectiveness of semantic enrichment and few-shot prompting, highlighting the framework's robustness and adaptability.

GenOM: Ontology Matching with Description Generation and Large Language Model

TL;DR

GenOM presents an LLM-enhanced ontology matching framework that semantically enriches concepts via generated definitions, uses embedding-based candidate retrieval, and applies LLM-driven binary judgments to determine equivalence, followed by post-processing with exact matching. Evaluated on five biomedical ontologies from the OAEI Bio-ML track, GenOM demonstrates competitive performance, with larger models like Qwen32B delivering stronger, more threshold-robust results and ablations confirming the value of semantic enrichment and few-shot prompting. The work contributes: (1) a modular GenOM pipeline, (2) a novel evaluation framework for LLM-generated definitions, (3) a cross-model analysis of LLM scales, and (4) evidence that semantic enrichment improves both candidate retrieval and final judgments, outperforming many traditional and recent LLM-based baselines. The findings suggest that LLM-driven semantic enrichment can significantly enhance biomedical ontology alignment and offer practical robustness for real-world semantic interoperability, while outlining directions for dynamic thresholds and expanded alignment types in future work.

Abstract

Ontology matching (OM) plays an essential role in enabling semantic interoperability and integration across heterogeneous knowledge sources, particularly in the biomedical domain which contains numerous complex concepts related to diseases and pharmaceuticals. This paper introduces GenOM, a large language model (LLM)-based ontology alignment framework, which enriches the semantic representations of ontology concepts via generating textual definitions, retrieves alignment candidates with an embedding model, and incorporates exact matching-based tools to improve precision. Extensive experiments conducted on the OAEI Bio-ML track demonstrate that GenOM can often achieve competitive performance, surpassing many baselines including traditional OM systems and recent LLM-based methods. Further ablation studies confirm the effectiveness of semantic enrichment and few-shot prompting, highlighting the framework's robustness and adaptability.

Paper Structure

This paper contains 24 sections, 2 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: The Architecture of GenOM
  • Figure 2: Template and example of concept information used for embedding-based representation.
  • Figure 3: F1-Scores of GenOM under the LLM-only setting when the thresholds of cosine similarity and token probability are set to different values. Results of two matching tasks using Qwen7B and Qwen32B are reported.
  • Figure 4: Impact of concept definitions on LLM-based local ranking (Qwen32B).
  • Figure 5: Impact of concept definitions on embedding-based candidate retrieval (text-embedding-3-small).