Implementing a Scalable, Redeployable and Multitiered Repository for FAIR and Secure Scientific Data Sharing: The BIG-MAP Archive
Valeria Granata, Francois Liot, Xing Wang, Steen Lysgaard, Ivano E. Castelli, Tejs Vegge, Nicola Marzari, Giovanni Pizzi
TL;DR
The paper presents the BIG-MAP Archive, a scalable, private data repository for large consortia built on InvenioRDM, designed to enable secure, selective data sharing among BATTERY 2030+ members with fine-grained permissions and tokenized links. It details the underlying technologies, access control mechanisms, anonymized usage metrics, and REST APIs, plus an API Client to simplify integration with external tools like FINALES. Key contributions include a reusable deployment template for other consortia, a roadmap for Keycloak-based authentication, automatic publication workflows to public repositories, and FAIR-by-Design data practices via BattINFO ontology mappings and JSON-LD semantic annotations. The work highlights the repository's potential to scale to additional projects and interoperable data spaces (Materials Cloud, RAISE) while maintaining confidentiality and auditable governance.
Abstract
Data sharing in large consortia, such as research collaborations or industry partnerships, requires addressing both organizational and technical challenges. A common platform is essential to promote collaboration, facilitate exchange of findings, and ensure secure access to sensitive data. Key technical challenges include creating a scalable architecture, a user-friendly interface, and robust security and access control. The BIG-MAP Archive is a cloud-based, disciplinary, private repository designed to address these challenges. Built on InvenioRDM, it leverages platform functionalities to meet consortium-specific needs, providing a tailored solution compared to general repositories. Access can be restricted to members of specific communities or open to the entire consortium, such as the BATTERY 2030+, a consortium accelerating advanced battery technologies. Uploaded data and metadata are controlled via fine grained permissions, allowing access to individual project members or the full initiative. The formalized upload process ensures data are formatted and ready for publication in open repositories when needed. This paper reviews the repository's key features, showing how the BIG-MAP Archive enables secure, controlled data sharing within large consortia. It ensures data confidentiality while supporting flexible, permissions-based access and can be easily redeployed for other consortia, including MaterialsCommons4.eu and RAISE (Resource for AI Science in Europe).
