Table of Contents
Fetching ...

A Python workflow definition for computational materials design

Jan Janssen, Janine George, Julian Geiger, Marnik Bercx, Xing Wang, Christina Ertural, Joerg Schaarschmidt, Alex M. Ganose, Giovanni Pizzi, Tilmann Hickel, Joerg Neugebauer

TL;DR

The paper addresses the lack of interoperability among Python-based workflow management systems in computational materials design. It introduces the Python Workflow Definition (PWD), a three-component exchange format (conda environment, Python module, and JSON graph) that supports DAG-based workflows to enable export/import across AiiDA, jobflow, and pyiron, advancing FAIR workflow principles. Through concrete demonstrations—an arithmetic Python workflow and a Quantum ESPRESSO energy-versus-volume workflow—the authors show how PWD can encapsulate both pure-Python and file-based steps, and how workflows can be reloaded and parameterized within different WfMS. They also illustrate compatibility with non-Python workflows by representing a file-based nfdi4ing benchmark and providing a CWL export path, highlighting the potential for broader interoperability across scientific software. The work lays the foundation for scalable, portable, and reproducible material-design workflows across diverse computational environments, with future directions aimed at broader WfMS adoption and dynamic workflow support.

Abstract

Numerous Workflow Management Systems (WfMS) have been developed in the field of computational materials science with different workflow formats, hindering interoperability and reproducibility of workflows in the field. To address this challenge, we introduce here the Python Workflow Definition (PWD) as a workflow exchange format to share workflows between Python-based WfMS, currently AiiDA, jobflow, and pyiron. This development is motivated by the similarity of these three Python-based WfMS, that represent the different workflow steps and data transferred between them as nodes and edges in a graph. With the PWD, we aim at fostering the interoperability and reproducibility between the different WfMS in the context of Findable, Accessible, Interoperable, Reusable (FAIR) workflows. To separate the scientific from the technical complexity, the PWD consists of three components: (1) a conda environment that specifies the software dependencies, (2) a Python module that contains the Python functions represented as nodes in the workflow graph, and (3) a workflow graph stored in the JavaScript Object Notation (JSON). The first version of the PWD supports directed acyclic graph (DAG)-based workflows. Thus, any DAG-based workflow defined in one of the three WfMS can be exported to the PWD and afterwards imported from the PWD to one of the other WfMS. After the import, the input parameters of the workflow can be adjusted and computing resources can be assigned to the workflow, before it is executed with the selected WfMS. This import from and export to the PWD is enabled by the PWD Python library that implements the PWD in AiiDA, jobflow, and pyiron.

A Python workflow definition for computational materials design

TL;DR

The paper addresses the lack of interoperability among Python-based workflow management systems in computational materials design. It introduces the Python Workflow Definition (PWD), a three-component exchange format (conda environment, Python module, and JSON graph) that supports DAG-based workflows to enable export/import across AiiDA, jobflow, and pyiron, advancing FAIR workflow principles. Through concrete demonstrations—an arithmetic Python workflow and a Quantum ESPRESSO energy-versus-volume workflow—the authors show how PWD can encapsulate both pure-Python and file-based steps, and how workflows can be reloaded and parameterized within different WfMS. They also illustrate compatibility with non-Python workflows by representing a file-based nfdi4ing benchmark and providing a CWL export path, highlighting the potential for broader interoperability across scientific software. The work lays the foundation for scalable, portable, and reproducible material-design workflows across diverse computational environments, with future directions aimed at broader WfMS adoption and dynamic workflow support.

Abstract

Numerous Workflow Management Systems (WfMS) have been developed in the field of computational materials science with different workflow formats, hindering interoperability and reproducibility of workflows in the field. To address this challenge, we introduce here the Python Workflow Definition (PWD) as a workflow exchange format to share workflows between Python-based WfMS, currently AiiDA, jobflow, and pyiron. This development is motivated by the similarity of these three Python-based WfMS, that represent the different workflow steps and data transferred between them as nodes and edges in a graph. With the PWD, we aim at fostering the interoperability and reproducibility between the different WfMS in the context of Findable, Accessible, Interoperable, Reusable (FAIR) workflows. To separate the scientific from the technical complexity, the PWD consists of three components: (1) a conda environment that specifies the software dependencies, (2) a Python module that contains the Python functions represented as nodes in the workflow graph, and (3) a workflow graph stored in the JavaScript Object Notation (JSON). The first version of the PWD supports directed acyclic graph (DAG)-based workflows. Thus, any DAG-based workflow defined in one of the three WfMS can be exported to the PWD and afterwards imported from the PWD to one of the other WfMS. After the import, the input parameters of the workflow can be adjusted and computing resources can be assigned to the workflow, before it is executed with the selected WfMS. This import from and export to the PWD is enabled by the PWD Python library that implements the PWD in AiiDA, jobflow, and pyiron.

Paper Structure

This paper contains 8 sections, 4 figures.

Figures (4)

  • Figure 1: The Python Workflow Definition (PWD) consists of three components: a conda environment, a Python module, and a JSON workflow representation. The three Workflow Management Systems AiiDA, jobflow, and pyiron all support both importing and exporting to and from the PWD.
  • Figure 2: The arithmetic workflow computes the sum of the product and quotient of two numbers. The red nodes of the workflow graph denote inputs, the orange the outputs, and the blue nodes the Python functions for the computations. The labels of the edges denote the data transferred between the nodes.
  • Figure 3: Energy-versus-volume curve calculation workflow with Quantum ESPRESSO. Red boxes denote inputs, orange boxes outputs, blue boxes Python functions and green boxes calls to external executables.
  • Figure 4: File-based finite element workflow from Ref. nfdi4ing implemented with the Python Workflow Definition (PWD). Red nodes denote inputs, orange nodes outputs, green nodes calls to external executables, and the labels on the edges the files and data transferred between them. Files are passed as path objects between the individual steps.