Table of Contents
Fetching ...

The Reproducible Research Platform establishes a unified open science environment bridging data and software lifecycles across disciplines, from proposal to publication

Andreas P. Cuny, Henry Lütcke, Andrei-Valentin Plamadă, Antti Luomi, John Hennig, Matthew Baker, Fabian Rudolf, Bernd Rinn

TL;DR

The paper introduces the Reproducible Research Platform (RRP), an open-source, project-centric environment that unifies research data management with containerized, executable computational environments to achieve FAIR-by-design reproducibility across diverse disciplines. Built on Kubernetes, openBIS RDMS, Git, and REES-based environment specifications, RRP enables one-click reproducibility, easy collaboration, and seamless publication workflows through features like player bundles/scripts and DOIs. The authors demonstrate RRP’s applicability by reproducing results from studies spanning over a decade in fields such as diagnostics, archaeology, and neuroscience, illustrating robust, cross-domain reproducibility and usability. The work argues that broad adoption of RRP could transform reproducible science into a default practice, reduce wasted effort from irreproducible studies, and foster ongoing development of domain-specific templates and tools within an open, interoperable ecosystem.

Abstract

Many research groups aspire to make data and code FAIR and reproducible, yet struggle because the data and code life cycles are disconnected, executable environments are often missing from published work, and technical skill requirements hinder adoption. Existing approaches rarely enable researchers to keep using their preferred tools or support seamless execution across domains. To close this gap, we developed the open-source Reproducible Research Platform (RRP), which unifies research data management with version-controlled, containerized computational environments in modular, shareable projects. RRP enables anyone to execute, reuse, and publish fully documented, FAIR research workflows without manual retrieval or platform-specific setup. We demonstrate RRP's impact by reproducing results from diverse published studies, including work over a decade old, showing sustained reproducibility and usability. With a minimal graphical interface focused on core tasks, modular tool installation, and compatibility with institutional servers or local computers, RRP makes reproducible science broadly accessible across scientific domains.

The Reproducible Research Platform establishes a unified open science environment bridging data and software lifecycles across disciplines, from proposal to publication

TL;DR

The paper introduces the Reproducible Research Platform (RRP), an open-source, project-centric environment that unifies research data management with containerized, executable computational environments to achieve FAIR-by-design reproducibility across diverse disciplines. Built on Kubernetes, openBIS RDMS, Git, and REES-based environment specifications, RRP enables one-click reproducibility, easy collaboration, and seamless publication workflows through features like player bundles/scripts and DOIs. The authors demonstrate RRP’s applicability by reproducing results from studies spanning over a decade in fields such as diagnostics, archaeology, and neuroscience, illustrating robust, cross-domain reproducibility and usability. The work argues that broad adoption of RRP could transform reproducible science into a default practice, reduce wasted effort from irreproducible studies, and foster ongoing development of domain-specific templates and tools within an open, interoperable ecosystem.

Abstract

Many research groups aspire to make data and code FAIR and reproducible, yet struggle because the data and code life cycles are disconnected, executable environments are often missing from published work, and technical skill requirements hinder adoption. Existing approaches rarely enable researchers to keep using their preferred tools or support seamless execution across domains. To close this gap, we developed the open-source Reproducible Research Platform (RRP), which unifies research data management with version-controlled, containerized computational environments in modular, shareable projects. RRP enables anyone to execute, reuse, and publish fully documented, FAIR research workflows without manual retrieval or platform-specific setup. We demonstrate RRP's impact by reproducing results from diverse published studies, including work over a decade old, showing sustained reproducibility and usability. With a minimal graphical interface focused on core tasks, modular tool installation, and compatibility with institutional servers or local computers, RRP makes reproducible science broadly accessible across scientific domains.

Paper Structure

This paper contains 22 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Data and code lifecycle of research projects and overview of RRP, an integrated, reproducible, and executable data storage and analysis platform. a Typical data and code lifecycles, often considered as separate processes, although beginning with a common research question and potential data and code reuse with key steps until publication and community engagement. b Research management within RRP, integrating both data and code within one platform. Research data are managed using the research data management system (RDMS), openBIS ELN-LIMS. Analysis scripts, code, and computational environment definitions are managed with Git (using platforms like GitLab or GitHub). With RRP, reproducible computational environments are created as Docker containers from the code repository using the open-source tool repo2docker. Users can access computational environments through popular user interfaces like JupyterLab. RRP directly mounts the relevant data stored in the RDMS. RRP directly enables the sharing of self-contained, executable projects facilitating collaborations with colleagues within the research group or with researchers worldwide. Overall, RRP accelerates research project management from planning until publication and beyond.
  • Figure 2: Schematic of RRP's system architecture. RRP's components are grouped into "core" and "project" and interactions with RRP project specifications (Git repository), RDMS (openBIS), and public Open Container Initiative (OCI) registry (Docker Hub) are indicated with arrows.
  • Figure 3: New RRP project specification. a,b, Step 1: Registering required research data in RDMS generates unique identifies (permID) for the datasets. c,d Step 2: Definition of a RRP Git project. The example folder listing shows the minimally required folders and optional files in the repository root. e,f Step 3: Definition of datasets to be mounted in RRP form the RDMS. f Example of a datasets.yaml file with the server, the folder name of the dataset on RRMS, and the permID of the dataset. g New data can be added any time, for example, in the RRP GUI. h,i Step 4: Definition of the computation environment for the project. i Example of system libraries (apt.txt), runtime version (runtime.txt), and packages (requirements.txt). j,k Step 5: Optionally, an analysis notebook/workflow (k) describing how data is transformed into results, as well as additional files relevant to the research project, can be added to be tracked. l,m Step 6: With an RRP Git project fully defined, it can be created (built) from the RRP GUI (m). It is also possible to open an existing RRP project present in the RDMS (openBIS) or form a shared identifier, when shared from a colleague. n The project's computational environment is then build and the project can be started, stopped or deleted. o In the Details Tab of the project one gets an overview over the Git status and can allocate the required computational resources (CPU, RAM). p When entering the project, the Git repository from d is now built into a container and its file contents are present in the /project subfolder. The data is mounted into the /openbis subfolder and results can be saved in the /results subfolder.
  • Figure 4: Steps for collaboration and publishing an RRP project. a,b, For collaboration (optional Step 7), one can share the current stage of the project by generating a Share identifier from within the Share tab of an RRP project within the GUI. Others do not have to build the computational environment and simply obtain a clone. c To share the RRP project with anyone or for offline use (e.g., outside the research group), one can bundle the Project into a player bundle (that includes all the data) or generate a Player script (without data, reduced archive size). However, the latter requires exporting the datasets first. Finally, the whole image can be exported to the RDMS to snapshot an important milestone. d,e To inspect the results of an RRP project (optional Step 8), one does not need to start it. When saved within the /results folder, they can be inspected from the RRP GUI. f,g Registering an RRP project or any results in the RDMS can be done form the Upload Tab in optional Step 9. g Here, any data-type can be registered in the RDMS and the RRP project be attached to a project in the ELN-LIMS. h,i In Step 10 the RRP project can be published to obtain a DOI (e.g. Zenodo) through the RDMS. j The Logs tab lists information from the different interactions with an RRP project.
  • Figure 5: Examples of reproducing published research within RRP. a, Workflow of the research from raw data management to statistical analysis in quantifying rapid diagnostic test line signals. b, RRP Git folder content. c, Differences of the result obtained within RRP to the published resultscuny_pypocquant_2021. The mean (dark grey), SD (grey), and percent coefficient of variation (light grey) are plotted for each POCT and TL1. d Workflow of research from data collection to analysis in the field of Archaeology. e, RRP Git folder content with two analysis notebooks for directly reproducing the published study and one with the data registered to our RDMS. f Resulting figure generated in RRP corresponding to Fig. 5 in clarkson_archaeology_2015. g Workflow of research from data collection to analysis in the field of Neuroscience and ML. h RRP Git folder content mounting published work. i Resulting figure generated in RRP (with adjusted size) corresponding to Fig. 3 beam_data-driven_2021.
  • ...and 10 more figures