Table of Contents
Fetching ...

WorkflowHub: a registry for computational workflows

Ove Johan Ragnar Gustafsson, Sean R. Wilkinson, Finn Bacall, Luca Pireddu, Stian Soiland-Reyes, Simone Leo, Stuart Owen, Nick Juty, José M. Fernández, Björn Grüning, Tom Brown, Hervé Ménager, Salvador Capella-Gutierrez, Frederik Coppens, Carole Goble

TL;DR

WorkflowHub presents a global, domain-agnostic registry that unifies diverse computational workflow resources to improve findability, accessibility, interoperability, and reusability. It establishes a data model that reflects real-world collaborations (Organisations, Teams, Spaces) and integrates with standards (Bioschemas, EDAM, RO-Crate) and platforms (GA4GH TRS, LifeMonitor) to enable end-to-end lifecycle support from development to citation. Key contributions include a wizard-driven workflow registration flow, robust attribution and citation via DOIs and RO-Crates, and extensive integrations that connect workflows to external repositories, execution platforms, and scholarly infrastructures. This registry aims to accelerate reproducible science by providing credit, improving discoverability, and enabling cross-domain workflow reuse, with active community engagement and ongoing onboarding of new domains and communities.

Abstract

The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing steps, workflows should be reproducible, reusable, adaptable, and available. Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. In reality, workflows are scattered and difficult to find, in part due to the diversity of available workflow engines and ecosystems, and because workflow sharing is not yet part of research practice. WorkflowHub provides a unified registry for all computational workflows that links to community repositories, and supports both the workflow lifecycle and making workflows findable, accessible, interoperable, and reusable (FAIR). By interoperating with diverse platforms, services, and external registries, WorkflowHub adds value by supporting workflow sharing, explicitly assigning credit, enhancing FAIRness, and promoting workflows as scholarly artefacts. The registry has a global reach, with hundreds of research organisations involved, and more than 700 workflows registered.

WorkflowHub: a registry for computational workflows

TL;DR

WorkflowHub presents a global, domain-agnostic registry that unifies diverse computational workflow resources to improve findability, accessibility, interoperability, and reusability. It establishes a data model that reflects real-world collaborations (Organisations, Teams, Spaces) and integrates with standards (Bioschemas, EDAM, RO-Crate) and platforms (GA4GH TRS, LifeMonitor) to enable end-to-end lifecycle support from development to citation. Key contributions include a wizard-driven workflow registration flow, robust attribution and citation via DOIs and RO-Crates, and extensive integrations that connect workflows to external repositories, execution platforms, and scholarly infrastructures. This registry aims to accelerate reproducible science by providing credit, improving discoverability, and enabling cross-domain workflow reuse, with active community engagement and ongoing onboarding of new domains and communities.

Abstract

The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing steps, workflows should be reproducible, reusable, adaptable, and available. Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. In reality, workflows are scattered and difficult to find, in part due to the diversity of available workflow engines and ecosystems, and because workflow sharing is not yet part of research practice. WorkflowHub provides a unified registry for all computational workflows that links to community repositories, and supports both the workflow lifecycle and making workflows findable, accessible, interoperable, and reusable (FAIR). By interoperating with diverse platforms, services, and external registries, WorkflowHub adds value by supporting workflow sharing, explicitly assigning credit, enhancing FAIRness, and promoting workflows as scholarly artefacts. The registry has a global reach, with hundreds of research organisations involved, and more than 700 workflows registered.

Paper Structure

This paper contains 29 sections, 4 figures.

Figures (4)

  • Figure 1: WorkflowHub connects to platforms, services, and resources that support a workflow's life cycle courbebaisse_research_2023. A researcher initially needs to Plan & Find, where they either plan for a particular analysis and find existing workflows (i.e. using a registry), or Develop a new workflow. WorkflowHub integrates with Git repositories (e.g. GitHub, GitLab), and Git-supported communities (e.g. nf-core), to support development. A workflow requires Test & Review to Run & Deploy, and here WorkflowHub connects to support services (e.g. LifeMonitor, bio.tools, Sapporo WES, WfExS) and welcomes diverse workflow platforms that aid deployment (e.g. CWL, Snakemake, Galaxy, Jupyter, Python, BASH, WDL, Nextflow). A creator needs to Share a workflow and can benefit from WorkflowHub's use of citation infrastructures and standards (i.e CITATION.cff, Zenodo, DataCite, DOI and ORCID). In the Maintain & Learn stage, maintenance, and also understanding of a workflow by other researchers, becomes critical as it impacts workflow Reuse & Rework, where a workflow is either reused, or adapted, by other researchers to suit their requirements. WorkflowHub supports these stages through registration of digital objects that enrich a workflow (e.g. documents, publications, SOPs), the ability to create Collections and workflow citations based on DOIs, and ultimately through the connections created to knowledge graphs. WorkflowHub also enables communities of practice to benefit from all its integrations and connections, ensuring that they can reuse or rework workflows from across the globe. The entire support framework is enabled by the implementation of standards that allow WorkflowHub to interact with the ecosystem and truly act as a "Hub": EDAM, Research Object Crates (RO-Crates), GA4GH APIs, abstract Common Workflow Language (CWL), FAIR Signposting, and Bioschemas.
  • Figure 2: Workflow types registered with WorkflowHub.
  • Figure 3: A guide to the structures in WorkflowHub. You, the user, belong to one or more Organisations (i.e. affiliations). You can also belong to one or more Teams, each of which also needs to belong to a single Space (top). You can nominate which Organisations you wish to use for the different Teams that you have created or joined, and you can belong to multiple Teams in the same Space, as well as multiple Teams in other Spaces (bottom). Image reused with permission from WorkflowHub documentation.
  • Figure 4: Two example entries in WorkflowHub (left: koster_snakemake-workflowsdna-seq-varlociraptor_2023, right: silver_find_2024) with sections of the user interface annotated and each entry using the flexible features of WorkflowHub in distinct ways. Entry features include A) workflow type, B) title, C) access panel with links to the source repository (e.g. GitHub), requests to contact the creators, subscribe / unsubscribe, download research object crate (RO-crate), add to a Collection, and in the right hand example access to administrative menus such as Add new (e.g. document, SOP) and Actions (e.g. edit or manage the workflow, including versions and minting DOIs), D) tabs for navigation between the entry overview, the list of files in the entry, and lists of items related to the workflow, including people, Teams, Spaces, Organisations, and other digital objects (e.g. publications, documents, SOPs, other workflows), E) description, which can be imported from Git, if available, F) version history, including Git commits, if available, G) creator and submitter information, H) links to more information about tools that comprise the workflow (i.e. bio.tools registry entries), I) license information, J) activity metrics (i.e. downloads and views), K) ontology concept annotations (e.g. EDAM in the left example entry), L) workflow diagram, M) parsed workflow inputs, outputs and steps for specific WfMS (e.g. Galaxy in the right example entry), N) buttons for launching workflows on execution platforms (e.g. Galaxy for right example entry), O) citation for the workflow (i.e. either using information from a minted DOI or a custom citation (e.g. workflow publication), P) custom tags, and Q) Collections that include the current workflow entry.