Table of Contents
Fetching ...

Converter: Enhancing Interoperability in Research Data Management

Sefika Efeoglu, Zongxiong Chen, Sonja Schimmler, Bianca Wentzel

TL;DR

The paper addresses interoperability challenges in Research Data Management caused by heterogeneous metadata formats across Berlin University Alliance repositories and their publication vocabularies. It introduces Converter, a schema-matching and transformation pipeline that maps harvested metadata to the DCAT vocabulary (DCAT-AP 2019) and serves as a bridge to the Piveau harvester, enabling DCAT-consistent ingestion. The authors present a pluggable converter service with importers, transformers, and exporters that reduces the need for extensive Piveau adaptations and can be extended to additional data sources within the NFDI initiative, such as NFDI4Cat and NFDI4DataScience. The work improves data accessibility and interoperability by harmonizing heterogeneous resources into a unified DCAT-based framework, supporting FAIR data management and streamlined collaboration across institutions.

Abstract

Research Data Management (RDM) is essential in handling and organizing data in the research field. The Berlin Open Science Platform (BOP) serves as a case study that exemplifies the significance of standardization within the Berlin University Alliance (BUA), employing different vocabularies when publishing their data, resulting in data heterogeneity. The meta portals of the NFDI4Cat and the NFDI4DataScience project serve as additional case studies in the context of the NFDI initiative. To establish consistency among the harvested repositories in the respective systems, this study focuses on developing a novel component, namely the \textit{converter}, that breaks barriers between data collection and various schemas. With the minor modification of the existing Piveau framework, the development of the converter, contributes to enhanced data accessibility, streamlined collaboration, and improved interoperability within the research community.

Converter: Enhancing Interoperability in Research Data Management

TL;DR

The paper addresses interoperability challenges in Research Data Management caused by heterogeneous metadata formats across Berlin University Alliance repositories and their publication vocabularies. It introduces Converter, a schema-matching and transformation pipeline that maps harvested metadata to the DCAT vocabulary (DCAT-AP 2019) and serves as a bridge to the Piveau harvester, enabling DCAT-consistent ingestion. The authors present a pluggable converter service with importers, transformers, and exporters that reduces the need for extensive Piveau adaptations and can be extended to additional data sources within the NFDI initiative, such as NFDI4Cat and NFDI4DataScience. The work improves data accessibility and interoperability by harmonizing heterogeneous resources into a unified DCAT-based framework, supporting FAIR data management and streamlined collaboration across institutions.

Abstract

Research Data Management (RDM) is essential in handling and organizing data in the research field. The Berlin Open Science Platform (BOP) serves as a case study that exemplifies the significance of standardization within the Berlin University Alliance (BUA), employing different vocabularies when publishing their data, resulting in data heterogeneity. The meta portals of the NFDI4Cat and the NFDI4DataScience project serve as additional case studies in the context of the NFDI initiative. To establish consistency among the harvested repositories in the respective systems, this study focuses on developing a novel component, namely the \textit{converter}, that breaks barriers between data collection and various schemas. With the minor modification of the existing Piveau framework, the development of the converter, contributes to enhanced data accessibility, streamlined collaboration, and improved interoperability within the research community.
Paper Structure (3 sections, 2 figures)

This paper contains 3 sections, 2 figures.

Figures (2)

  • Figure 1: An overview of three repositories from the institutions of the Berlin University Alliance on the Berlin Open Science Platform.
  • Figure 2: Pipeline: The converter communicates with different repositories and transforms different schemas and vocabularies into a standardized format, i.e., DCAT, and harvester employs importer to fetch metadata from converter and exports to persistent datastore.