Table of Contents
Fetching ...

A Protocol for KG Construction Tasks Involving Users

Ademar Crotti Junior, Christophe Debruyne

TL;DR

The paper addresses the lack of standardized protocols for user studies in knowledge graph construction (KGC) by proposing a detailed, open protocol centered on RDF Mapping Language (RML) core functionality and its extensions. It combines literature-informed guidelines with a concrete five-task mapping workflow, accompanied by structured participant procedures and a battery of usability and workload measures (PSSUQ, SUS, WP, NASA-TLX) plus clear accuracy and timing metrics, enabling robust cross-study comparisons. The protocol supports both single-group evaluations and controlled group comparisons, and it offers variants to test extensions or domain adaptations, with explicit statistical analysis plans (normality, variances, correlations) and reliability checks (Cronbach's Alpha $\ge 0.7$). Resources are provided under CC-BY-SA 4.0 with a DOI and a GitHub repository to foster adoption and future repository-based comparisons, with plans to engage the W3C for broader community uptake.

Abstract

Knowledge graph construction (KGC) from (semi-)structured data is challenging, and facilitating user involvement is an issue frequently brought up within this community. We cannot deny the progress we have made with respect to (declarative) knowledge graph construction languages and tools to help build such mappings. However, it is surprising that no two studies report on similar protocols. This heterogeneity does not allow for comparing KGC languages, techniques, and tools. This paper first analyses studies involving users to identify the points of comparison. These gaps include a lack of systematic consistency in task design, participant selection, and evaluation metrics. Moreover, there needs to be a systematic way of analyzing the data and reporting the findings, which is also lacking. We thus propose and introduce a user protocol for KGC designed to address this challenge. Where possible, we draw and take elements from the literature we deem fit for such a protocol. The protocol, as such, allows for the comparison of languages and techniques for the RDF Mapping Language (RML) core functionality, which is covered by most of the other state-of-the-art techniques and tools. We also propose how the protocol can be amended to compare extensions (of RML). This protocol provides an important step towards a more comparable evaluation of KGC user studies.

A Protocol for KG Construction Tasks Involving Users

TL;DR

The paper addresses the lack of standardized protocols for user studies in knowledge graph construction (KGC) by proposing a detailed, open protocol centered on RDF Mapping Language (RML) core functionality and its extensions. It combines literature-informed guidelines with a concrete five-task mapping workflow, accompanied by structured participant procedures and a battery of usability and workload measures (PSSUQ, SUS, WP, NASA-TLX) plus clear accuracy and timing metrics, enabling robust cross-study comparisons. The protocol supports both single-group evaluations and controlled group comparisons, and it offers variants to test extensions or domain adaptations, with explicit statistical analysis plans (normality, variances, correlations) and reliability checks (Cronbach's Alpha ). Resources are provided under CC-BY-SA 4.0 with a DOI and a GitHub repository to foster adoption and future repository-based comparisons, with plans to engage the W3C for broader community uptake.

Abstract

Knowledge graph construction (KGC) from (semi-)structured data is challenging, and facilitating user involvement is an issue frequently brought up within this community. We cannot deny the progress we have made with respect to (declarative) knowledge graph construction languages and tools to help build such mappings. However, it is surprising that no two studies report on similar protocols. This heterogeneity does not allow for comparing KGC languages, techniques, and tools. This paper first analyses studies involving users to identify the points of comparison. These gaps include a lack of systematic consistency in task design, participant selection, and evaluation metrics. Moreover, there needs to be a systematic way of analyzing the data and reporting the findings, which is also lacking. We thus propose and introduce a user protocol for KGC designed to address this challenge. Where possible, we draw and take elements from the literature we deem fit for such a protocol. The protocol, as such, allows for the comparison of languages and techniques for the RDF Mapping Language (RML) core functionality, which is covered by most of the other state-of-the-art techniques and tools. We also propose how the protocol can be amended to compare extensions (of RML). This protocol provides an important step towards a more comparable evaluation of KGC user studies.

Paper Structure

This paper contains 12 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Entity-Relationship Diagram (ERD) representing the UoD of the data used in the protocol.