Table of Contents
Fetching ...

ProvDeploy: Provenance-oriented Containerization of High Performance Computing Scientific Workflows

Liliane Kunstmann, Débora Pina, Daniel de Oliveira, Marta Mattoso

TL;DR

This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture, exploring containerization strategies focused on provenance in two distinct HPC environments.

Abstract

Many existing scientific workflows require High Performance Computing environments to produce results in a timely manner. These workflows have several software library components and use different environments, making the deployment and execution of the software stack not trivial. This complexity increases if the user needs to add provenance data capture services to the workflow. This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture. ProvDeploy was evaluated with a Scientific Machine Learning workflow, exploring containerization strategies focused on provenance in two distinct HPC environments

ProvDeploy: Provenance-oriented Containerization of High Performance Computing Scientific Workflows

TL;DR

This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture, exploring containerization strategies focused on provenance in two distinct HPC environments.

Abstract

Many existing scientific workflows require High Performance Computing environments to produce results in a timely manner. These workflows have several software library components and use different environments, making the deployment and execution of the software stack not trivial. This complexity increases if the user needs to add provenance data capture services to the workflow. This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture. ProvDeploy was evaluated with a Scientific Machine Learning workflow, exploring containerization strategies focused on provenance in two distinct HPC environments
Paper Structure (11 sections, 4 figures, 2 tables)

This paper contains 11 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Architecture of ProvDeploy.
  • Figure 2: DenseED architecture freitas2021encoder.
  • Figure 3: CPU consumption for coarse-grained and partial modular strategies.
  • Figure 4: CPU consumption for provenance modular strategy.