Table of Contents
Fetching ...

Fostering the integration of European Open Data into Data Spaces through High-Quality Metadata

Javier Conde, Alejandro Pozo, Andrés Munoz-Arcentales, Johnny Choque, Álvaro Alonso

TL;DR

The paper addresses the integration of European Open Data into Data Spaces by identifying metadata quality as a key barrier and delivering an automated toolchain for data transformation, DCAT-compliant metadata generation, and metadata quality assessment. It combines NiFi for ETL, CKAN-based data/metadata management, and DCAT/OAI-PMH interoperability to enable scalable publication of Open Data Portals, demonstrated through the YODA/Open Data ecosystem and Data.europa.eu integration. The main contributions include the mqa-scoring-api for pre-harvest metadata evaluation and a CKAN extension for DCAT-AP compatibility, validated on over 200 datasets with outstanding metadata scores. The work advances practical pathways for FAIR-compliant, interoperable data sharing within European Data Spaces, with demonstrated impact on metadata discoverability and reuse in real-world portals.

Abstract

The term Data Space, understood as the secure exchange of data in distributed systems, ensuring openness, transparency, decentralization, sovereignty, and interoperability of information, has gained importance during the last years. However, Data Spaces are in an initial phase of definition, and new research is necessary to address their requirements. The Open Data ecosystem can be understood as one of the precursors of Data Spaces as it provides mechanisms to ensure the interoperability of information through resource discovery, information exchange, and aggregation via metadata. However, Data Spaces require more advanced capabilities including the automatic and scalable generation and publication of high-quality metadata. In this work, we present a set of software tools that facilitate the automatic generation and publication of metadata, the modeling of datasets through standards, and the assessment of the quality of the generated metadata. We validate all these tools through the YODA Open Data Portal showing how they can be connected to integrate Open Data into Data Spaces.

Fostering the integration of European Open Data into Data Spaces through High-Quality Metadata

TL;DR

The paper addresses the integration of European Open Data into Data Spaces by identifying metadata quality as a key barrier and delivering an automated toolchain for data transformation, DCAT-compliant metadata generation, and metadata quality assessment. It combines NiFi for ETL, CKAN-based data/metadata management, and DCAT/OAI-PMH interoperability to enable scalable publication of Open Data Portals, demonstrated through the YODA/Open Data ecosystem and Data.europa.eu integration. The main contributions include the mqa-scoring-api for pre-harvest metadata evaluation and a CKAN extension for DCAT-AP compatibility, validated on over 200 datasets with outstanding metadata scores. The work advances practical pathways for FAIR-compliant, interoperable data sharing within European Data Spaces, with demonstrated impact on metadata discoverability and reuse in real-world portals.

Abstract

The term Data Space, understood as the secure exchange of data in distributed systems, ensuring openness, transparency, decentralization, sovereignty, and interoperability of information, has gained importance during the last years. However, Data Spaces are in an initial phase of definition, and new research is necessary to address their requirements. The Open Data ecosystem can be understood as one of the precursors of Data Spaces as it provides mechanisms to ensure the interoperability of information through resource discovery, information exchange, and aggregation via metadata. However, Data Spaces require more advanced capabilities including the automatic and scalable generation and publication of high-quality metadata. In this work, we present a set of software tools that facilitate the automatic generation and publication of metadata, the modeling of datasets through standards, and the assessment of the quality of the generated metadata. We validate all these tools through the YODA Open Data Portal showing how they can be connected to integrate Open Data into Data Spaces.
Paper Structure (20 sections, 2 figures, 3 tables)

This paper contains 20 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Phases of automatic publication of high-quality OD
  • Figure 2: Quality of Open Data Portals harvested by data.europa.eu (November, 2023)