Table of Contents
Fetching ...

IDEA2: Expert-in-the-loop competency question elicitation for collaborative ontology engineering

Elliott Watkiss-Leek, Reham Alharbi, Harry Rostron, Andrew Ng, Ewan Johnson, Andrew Mitchell, Terry R. Payne, Valentina Tamma, Jacopo de Berardinis

Abstract

Competency question (CQ) elicitation represents a critical but resource-intensive bottleneck in ontology engineering. This foundational phase is often hampered by the communication gap between domain experts, who possess the necessary knowledge, and ontology engineers, who formalise it. This paper introduces IDEA2, a novel, semi-automated workflow that integrates Large Language Models (LLMs) within a collaborative, expert-in-the-loop process to address this challenge. The methodology is characterised by a core iterative loop: an initial LLM-based extraction of CQs from requirement documents, a co-creational review and feedback phase by domain experts on an accessible collaborative platform, and an iterative, feedback-driven reformulation of rejected CQs by an LLM until consensus is achieved. To ensure transparency and reproducibility, the entire lifecycle of each CQ is tracked using a provenance model that captures the full lineage of edits, anonymised feedback, and generation parameters. The workflow was validated in 2 real-world scenarios (scientific data, cultural heritage), demonstrating that IDEA2 can accelerate the requirements engineering process, improve the acceptance and relevance of the resulting CQs, and exhibit high usability and effectiveness among domain experts. We release all code and experiments at https://github.com/KE-UniLiv/IDEA2

IDEA2: Expert-in-the-loop competency question elicitation for collaborative ontology engineering

Abstract

Competency question (CQ) elicitation represents a critical but resource-intensive bottleneck in ontology engineering. This foundational phase is often hampered by the communication gap between domain experts, who possess the necessary knowledge, and ontology engineers, who formalise it. This paper introduces IDEA2, a novel, semi-automated workflow that integrates Large Language Models (LLMs) within a collaborative, expert-in-the-loop process to address this challenge. The methodology is characterised by a core iterative loop: an initial LLM-based extraction of CQs from requirement documents, a co-creational review and feedback phase by domain experts on an accessible collaborative platform, and an iterative, feedback-driven reformulation of rejected CQs by an LLM until consensus is achieved. To ensure transparency and reproducibility, the entire lifecycle of each CQ is tracked using a provenance model that captures the full lineage of edits, anonymised feedback, and generation parameters. The workflow was validated in 2 real-world scenarios (scientific data, cultural heritage), demonstrating that IDEA2 can accelerate the requirements engineering process, improve the acceptance and relevance of the resulting CQs, and exhibit high usability and effectiveness among domain experts. We release all code and experiments at https://github.com/KE-UniLiv/IDEA2

Paper Structure

This paper contains 19 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the IDEA2 architecture: (1) CQ Extraction generates candidate requirements from heterogeneous sources; (2) these are published to a collaborative dashboard for Expert Validation; (3) feedback drives an Iterative Reformulation loop where rejected CQs are refined by the LLM based on the feedback; and (4) the process terminates upon meeting defined stopping criteria, producing a CQ export.
  • Figure 2: Screenshot of the IDEA2 collaborative dashboard on Notion. The interface shows the CQ Pool (left) and the detailed validation view (right) for a specific requirement (ID 109). In this example, a split vote (Score 0) and qualitative comments from domain experts highlight a domain-specific ambiguity, providing the necessary context to trigger and guide the reformulation loop.
  • Figure 3: Usage metrics results derived from the expert evaluation. The Likert scale charts illustrate the domain experts' agreement regarding the tool's consistency, collaborative capabilities, expressiveness, and intuitiveness across (a) the AnIML scenario ($N=4$) and (b) the Cultural Heritage scenario ($N=3$).