OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

Zhangcheng Qiang; Kerry Taylor; Weiqing Wang; Jing Jiang

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang

TL;DR

This work addresses the problem of large language model hallucinations in ontology matching by introducing OAEI-LLM, a benchmark that extends the OAEI dataset with LLM-generated alignments $A_{llm}$ and human references $R_{oaei}$ under a one-to-one constraint. It details a methodology for constructing the dataset, including a three-way error taxonomy (TypeDefinitionMissing, Missing from OAEI, Incorrect) and a schema extension to record hallucination metadata via an extended EDOAL framework (potentially using SSSOM). The key contributions are the dataset, the hallucination taxonomy, and the schema extension, plus discussion of use cases for benchmarking and fine-tuning LLMs in OM. This benchmark aims to facilitate robust evaluation and improvement of LLM-driven ontology matching by providing structured error signals and extensible schema for future research and practical deployment.

Abstract

Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

TL;DR

This work addresses the problem of large language model hallucinations in ontology matching by introducing OAEI-LLM, a benchmark that extends the OAEI dataset with LLM-generated alignments

and human references

under a one-to-one constraint. It details a methodology for constructing the dataset, including a three-way error taxonomy (TypeDefinitionMissing, Missing from OAEI, Incorrect) and a schema extension to record hallucination metadata via an extended EDOAL framework (potentially using SSSOM). The key contributions are the dataset, the hallucination taxonomy, and the schema extension, plus discussion of use cases for benchmarking and fine-tuning LLMs in OM. This benchmark aims to facilitate robust evaluation and improvement of LLM-driven ontology matching by providing structured error signals and extensible schema for future research and practical deployment.

Abstract

Paper Structure (9 sections, 1 figure, 1 table)

This paper contains 9 sections, 1 figure, 1 table.

Motivation
Methodology
Dataset Construction
Schema Extension
Potential Use Cases
Benchmarking LLMs for OM Tasks
A Dataset for Fine-tuning LLMs Used in OM Tasks
Limitations
Further Work

Figures (1)

Figure 1: Procedure for constructing the dataset: an LLM Alignment of the source and target ontologies is compared with the OAEI Reference and recorded in the OAEI-LLM Benchmark.

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

TL;DR

Abstract

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (1)