Table of Contents
Fetching ...

MaRDIFlow: A CSE workflow framework for abstracting meta-data from FAIR computational experiments

Pavan L. Veluvali, Jan Heiland, Peter Benner

TL;DR

The paper tackles the reproducibility and interoperability challenges of metadata-rich computational experiments by introducing MaRDIFlow, a CSE workflow framework that abstracts metadata within an ontology of mathematical objects and encapsulates execution/environment dependencies in multi-layered descriptions. The approach treats workflow components as interchangeable I/O objects, enabling redundancy across different realizations to enhance robustness and reuse. A working CLI prototype demonstrates how MaRDIFlow can document, execute, and provenance-track FAIR-compliant computational experiments, illustrated with a CO2 methanization reactor model and a 2D Cahn–Hilliard spinodal decomposition example. Contributions include a formal multi-layered abstraction for CSE workflows, integration of provenance, and a path toward an electronic lab notebook (ELN) to visualize and execute MaRDIFlow-enabled experiments. This framework aims to improve Findability, Accessibility, Interoperability, and Reusability of abstracted workflow components across heterogeneous platforms and use cases in mathematical sciences.

Abstract

Numerical algorithms and computational tools are instrumental in navigating and addressing complex simulation and data processing tasks. The exponential growth of metadata and parameter-driven simulations has led to an increasing demand for automated workflows that can replicate computational experiments across platforms. In general, a computational workflow is defined as a sequential description for accomplishing a scientific objective, often described by tasks and their associated data dependencies. If characterized through input-output relation, workflow components can be structured to allow interchangeable utilization of individual tasks and their accompanying metadata. In the present work, we develop a novel computational framework, namely, MaRDIFlow, that focuses on the automation of abstracting meta-data embedded in an ontology of mathematical objects. This framework also effectively addresses the inherent execution and environmental dependencies by incorporating them into multi-layered descriptions. Additionally, we demonstrate a working prototype with example use cases and methodically integrate them into our workflow tool and data provenance framework. Furthermore, we show how to best apply the FAIR principles to computational workflows, such that abstracted components are Findable, Accessible, Interoperable, and Reusable in nature.

MaRDIFlow: A CSE workflow framework for abstracting meta-data from FAIR computational experiments

TL;DR

The paper tackles the reproducibility and interoperability challenges of metadata-rich computational experiments by introducing MaRDIFlow, a CSE workflow framework that abstracts metadata within an ontology of mathematical objects and encapsulates execution/environment dependencies in multi-layered descriptions. The approach treats workflow components as interchangeable I/O objects, enabling redundancy across different realizations to enhance robustness and reuse. A working CLI prototype demonstrates how MaRDIFlow can document, execute, and provenance-track FAIR-compliant computational experiments, illustrated with a CO2 methanization reactor model and a 2D Cahn–Hilliard spinodal decomposition example. Contributions include a formal multi-layered abstraction for CSE workflows, integration of provenance, and a path toward an electronic lab notebook (ELN) to visualize and execute MaRDIFlow-enabled experiments. This framework aims to improve Findability, Accessibility, Interoperability, and Reusability of abstracted workflow components across heterogeneous platforms and use cases in mathematical sciences.

Abstract

Numerical algorithms and computational tools are instrumental in navigating and addressing complex simulation and data processing tasks. The exponential growth of metadata and parameter-driven simulations has led to an increasing demand for automated workflows that can replicate computational experiments across platforms. In general, a computational workflow is defined as a sequential description for accomplishing a scientific objective, often described by tasks and their associated data dependencies. If characterized through input-output relation, workflow components can be structured to allow interchangeable utilization of individual tasks and their accompanying metadata. In the present work, we develop a novel computational framework, namely, MaRDIFlow, that focuses on the automation of abstracting meta-data embedded in an ontology of mathematical objects. This framework also effectively addresses the inherent execution and environmental dependencies by incorporating them into multi-layered descriptions. Additionally, we demonstrate a working prototype with example use cases and methodically integrate them into our workflow tool and data provenance framework. Furthermore, we show how to best apply the FAIR principles to computational workflows, such that abstracted components are Findable, Accessible, Interoperable, and Reusable in nature.
Paper Structure (2 sections, 9 equations, 7 figures)

This paper contains 2 sections, 9 equations, 7 figures.

Table of Contents

  1. Introduction
  2. MaRDIFlow

Figures (7)

  • Figure 1: Generic chain of models that describe the workflow in the simulation of transformer noise generation and its realization on different levels of abstraction.
  • Figure 2: An exemplified vertical multi-level dimension of a MaRDIflow component: equivalent and preferably redundant descriptions of a workflow unit.
  • Figure 3: A screenshot illustrating the help message of our RDM tool, MaRDIFlow.
  • Figure 4: Example inputs object JSON file format with a set of static parameters required for a specific workflow component.
  • Figure 5: Screenshot of an example configparser .ini file to initialize and run MaRDIFlow through command-line.
  • ...and 2 more figures