Table of Contents
Fetching ...

Implementing a Scalable, Redeployable and Multitiered Repository for FAIR and Secure Scientific Data Sharing: The BIG-MAP Archive

Valeria Granata, Francois Liot, Xing Wang, Steen Lysgaard, Ivano E. Castelli, Tejs Vegge, Nicola Marzari, Giovanni Pizzi

TL;DR

The paper presents the BIG-MAP Archive, a scalable, private data repository for large consortia built on InvenioRDM, designed to enable secure, selective data sharing among BATTERY 2030+ members with fine-grained permissions and tokenized links. It details the underlying technologies, access control mechanisms, anonymized usage metrics, and REST APIs, plus an API Client to simplify integration with external tools like FINALES. Key contributions include a reusable deployment template for other consortia, a roadmap for Keycloak-based authentication, automatic publication workflows to public repositories, and FAIR-by-Design data practices via BattINFO ontology mappings and JSON-LD semantic annotations. The work highlights the repository's potential to scale to additional projects and interoperable data spaces (Materials Cloud, RAISE) while maintaining confidentiality and auditable governance.

Abstract

Data sharing in large consortia, such as research collaborations or industry partnerships, requires addressing both organizational and technical challenges. A common platform is essential to promote collaboration, facilitate exchange of findings, and ensure secure access to sensitive data. Key technical challenges include creating a scalable architecture, a user-friendly interface, and robust security and access control. The BIG-MAP Archive is a cloud-based, disciplinary, private repository designed to address these challenges. Built on InvenioRDM, it leverages platform functionalities to meet consortium-specific needs, providing a tailored solution compared to general repositories. Access can be restricted to members of specific communities or open to the entire consortium, such as the BATTERY 2030+, a consortium accelerating advanced battery technologies. Uploaded data and metadata are controlled via fine grained permissions, allowing access to individual project members or the full initiative. The formalized upload process ensures data are formatted and ready for publication in open repositories when needed. This paper reviews the repository's key features, showing how the BIG-MAP Archive enables secure, controlled data sharing within large consortia. It ensures data confidentiality while supporting flexible, permissions-based access and can be easily redeployed for other consortia, including MaterialsCommons4.eu and RAISE (Resource for AI Science in Europe).

Implementing a Scalable, Redeployable and Multitiered Repository for FAIR and Secure Scientific Data Sharing: The BIG-MAP Archive

TL;DR

The paper presents the BIG-MAP Archive, a scalable, private data repository for large consortia built on InvenioRDM, designed to enable secure, selective data sharing among BATTERY 2030+ members with fine-grained permissions and tokenized links. It details the underlying technologies, access control mechanisms, anonymized usage metrics, and REST APIs, plus an API Client to simplify integration with external tools like FINALES. Key contributions include a reusable deployment template for other consortia, a roadmap for Keycloak-based authentication, automatic publication workflows to public repositories, and FAIR-by-Design data practices via BattINFO ontology mappings and JSON-LD semantic annotations. The work highlights the repository's potential to scale to additional projects and interoperable data spaces (Materials Cloud, RAISE) while maintaining confidentiality and auditable governance.

Abstract

Data sharing in large consortia, such as research collaborations or industry partnerships, requires addressing both organizational and technical challenges. A common platform is essential to promote collaboration, facilitate exchange of findings, and ensure secure access to sensitive data. Key technical challenges include creating a scalable architecture, a user-friendly interface, and robust security and access control. The BIG-MAP Archive is a cloud-based, disciplinary, private repository designed to address these challenges. Built on InvenioRDM, it leverages platform functionalities to meet consortium-specific needs, providing a tailored solution compared to general repositories. Access can be restricted to members of specific communities or open to the entire consortium, such as the BATTERY 2030+, a consortium accelerating advanced battery technologies. Uploaded data and metadata are controlled via fine grained permissions, allowing access to individual project members or the full initiative. The formalized upload process ensures data are formatted and ready for publication in open repositories when needed. This paper reviews the repository's key features, showing how the BIG-MAP Archive enables secure, controlled data sharing within large consortia. It ensures data confidentiality while supporting flexible, permissions-based access and can be easily redeployed for other consortia, including MaterialsCommons4.eu and RAISE (Resource for AI Science in Europe).

Paper Structure

This paper contains 13 sections, 5 figures.

Figures (5)

  • Figure 1: Landing page of the BIG-MAP Archive, a domain-specific and private repository. Access to the repository is restricted to members of the BATTERY 2030+ initiative.
  • Figure 2: Users can choose to share their records either with members of their own project (called a "community" in the language of InvenioRDM) or with all members of the BATTERY 2030+ initiative. Sharing records in the BIG-MAP Archive requires authors to curate their data and prepare metadata. This makes it easier for them to publish their results in public repositories such as Zenodozenodo and Materials Cloud ArchiveMC thereafter.
  • Figure 3: In addition to sharing with communities, users can grant more restricted access permissions through share links to provide specific access (view-only, edit, ...) to selected users with whom the link is shared.
  • Figure 4: An example of a record page. The BIG-MAP Archive tracks usage statistics for each record and its versions, and the individual and cumulative statistics are displayed on each record (see bottom part of the figure).
  • Figure 5: The Materials Cloud Archive is a moderated and public repository for computational materials science. With a single click, users will be able to publish records shared on the BIG-MAP Archive directly to the Materials Cloud Archive.