OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang
TL;DR
This work addresses the problem of large language model hallucinations in ontology matching by introducing OAEI-LLM, a benchmark that extends the OAEI dataset with LLM-generated alignments $A_{llm}$ and human references $R_{oaei}$ under a one-to-one constraint. It details a methodology for constructing the dataset, including a three-way error taxonomy (TypeDefinitionMissing, Missing from OAEI, Incorrect) and a schema extension to record hallucination metadata via an extended EDOAL framework (potentially using SSSOM). The key contributions are the dataset, the hallucination taxonomy, and the schema extension, plus discussion of use cases for benchmarking and fine-tuning LLMs in OM. This benchmark aims to facilitate robust evaluation and improvement of LLM-driven ontology matching by providing structured error signals and extensible schema for future research and practical deployment.
Abstract
Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.
