Table of Contents
Fetching ...

An Ecosystem of Services for FAIR Computational Workflows

Sean R. Wilkinson, Johan Gustafsson, Finn Bacall, Khalid Belhajjame, Salvador Capella, Jose Maria Fernandez Gonzalez, Jacob Fosso Tande, Luiz Gadelha, Daniel Garijo, Patricia Grubel, Bjorn Grüning, Farah Zaib Khan, Sehrish Kanwal, Simone Leo, Stuart Owen, Luca Pireddu, Line Pouchard, Laura Rodríguez-Navas, Beatriz Serrano-Solano, Stian Soiland-Reyes, Baiba Vilne, Alan Williams, Merridee Ann Wouters, Frederik Coppens, Carole Goble

TL;DR

The paper addresses making computational workflows Findable, Accessible, Interoperable, and Reusable by extending FAIR principles to workflows and data/software integration. It presents the EOSC-Life FAIR Workflow Collaboratory as a concrete ecosystem combining metadata standards (Bioschemas, EDAM), a canonical description (CWL/WDL), and packaging (RO-Crate) with GA4GH protocols (TRS/WES) and registries (WorkflowHub) to enable end-to-end FAIRness. It details an interoperable services landscape, lifecycle support, and stakeholder roles to promote reuse, reproducibility, and cross-domain adoption. It also discusses challenges in standardization, portability, reproducibility, and quality, outlining paths for ongoing community governance and tooling to sustain FAIR workflows.

Abstract

Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows is inevitable to enable them to be Findable, Accessible, Interoperable, and Reusable. Making workflows FAIR would reduce duplication of effort, assist in the reuse of best practice approaches and community-supported standards, and ensure that workflows as digital objects can support reproducible and robust science. FAIR workflows also encourage interdisciplinary collaboration, enabling workflows developed in one field to be repurposed and adapted for use in other research domains. FAIR workflows draw from both FAIR data and software principles. Workflows propose explicit method abstractions and tight bindings to data, hence making many of the data principles apply. Meanwhile, as executable pipelines with a strong emphasis on code composition and data flow between steps, the software principles apply, too. As workflows are chiefly concerned with the processing and creation of data, they also have an important role to play in ensuring and supporting data FAIRification. The FAIR Principles for software and data mandate the use of persistent identifiers (PID) and machine actionable metadata associated with workflows to enable findability, reusability, interoperability and reusability. To implement the principles requires a PID and metadata framework with appropriate programmatic protocols, an accompanying ecosystem of services, tools, guidelines, policies, and best practices, as well the buy-in of existing workflow systems such that they adapt in order to adopt. The European EOSC-Life Workflow Collaboratory is an example of such a ...

An Ecosystem of Services for FAIR Computational Workflows

TL;DR

The paper addresses making computational workflows Findable, Accessible, Interoperable, and Reusable by extending FAIR principles to workflows and data/software integration. It presents the EOSC-Life FAIR Workflow Collaboratory as a concrete ecosystem combining metadata standards (Bioschemas, EDAM), a canonical description (CWL/WDL), and packaging (RO-Crate) with GA4GH protocols (TRS/WES) and registries (WorkflowHub) to enable end-to-end FAIRness. It details an interoperable services landscape, lifecycle support, and stakeholder roles to promote reuse, reproducibility, and cross-domain adoption. It also discusses challenges in standardization, portability, reproducibility, and quality, outlining paths for ongoing community governance and tooling to sustain FAIR workflows.

Abstract

Computational workflows, regardless of their portability or maturity, represent major investments of both effort and expertise. They are first class, publishable research objects in their own right. They are key to sharing methodological know-how for reuse, reproducibility, and transparency. Consequently, the application of the FAIR principles to workflows is inevitable to enable them to be Findable, Accessible, Interoperable, and Reusable. Making workflows FAIR would reduce duplication of effort, assist in the reuse of best practice approaches and community-supported standards, and ensure that workflows as digital objects can support reproducible and robust science. FAIR workflows also encourage interdisciplinary collaboration, enabling workflows developed in one field to be repurposed and adapted for use in other research domains. FAIR workflows draw from both FAIR data and software principles. Workflows propose explicit method abstractions and tight bindings to data, hence making many of the data principles apply. Meanwhile, as executable pipelines with a strong emphasis on code composition and data flow between steps, the software principles apply, too. As workflows are chiefly concerned with the processing and creation of data, they also have an important role to play in ensuring and supporting data FAIRification. The FAIR Principles for software and data mandate the use of persistent identifiers (PID) and machine actionable metadata associated with workflows to enable findability, reusability, interoperability and reusability. To implement the principles requires a PID and metadata framework with appropriate programmatic protocols, an accompanying ecosystem of services, tools, guidelines, policies, and best practices, as well the buy-in of existing workflow systems such that they adapt in order to adopt. The European EOSC-Life Workflow Collaboratory is an example of such a ...

Paper Structure

This paper contains 24 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: EOSC-Life FAIR Workflow Collaboratory
  • Figure 2: An RO-Crate for a Workflow with bioschemas, CWL and PIDs, including Contextual Entities. Non-contextual entities can be "attached" (included) or "detached" (referenced). Contextual Entities refer to PIDs such as An RO-Crate for a Workflow with bioschemas, CWL and PIDs, including Contextual Entities. Non-contextual entities can be "attached" (included) or "detached" (referenced). Contextual Entities refer to PIDs such as ORCID, ROR, and RAiD in Table \ref{['tab:3']}. Illustration based on [Soiland-Reyes et al 2023].
  • Figure 3: WorkflowHub entry hospital_2024.
  • Figure 4: The Workflow Lifecycle: FAIR by Design. Life cycle adapted from gustafsson_2024.