Towards cloud-native scientific workflow management

Michal Orzechowski; Bartosz Balis; Krzysztof Janecki

Towards cloud-native scientific workflow management

Michal Orzechowski, Bartosz Balis, Krzysztof Janecki

TL;DR

The paper investigates how to execute scientific workflows on Kubernetes-enabled cloud-native infrastructures by comparing a simple Job-based model, a Job-based model with task clustering, and a novel Worker Pools model implemented in HyperFlow. Using the Montage workflow on a Kubernetes/OpenStack cluster, it finds that the Worker Pools approach delivers the best cluster utilization and shortest makespan, though it introduces higher implementation and maintenance complexity. The work highlights a fundamental trade-off between simplicity and performance in cloud-native workflow management and provides concrete architectural guidance for choosing between models based on resource and maintenance constraints. It also contributes a practical HyperFlow implementation of the Worker Pools model that demonstrates the feasibility of microservice-based, auto-scalable task execution in scientific workflows.

Abstract

Cloud-native is an approach to building and running scalable applications in modern cloud infrastructures, with the Kubernetes container orchestration platform being often considered as a fundamental cloud-native building block. In this paper, we evaluate alternative execution models for scientific workflows in Kubernetes. We compare the simplest job-based model, its variant with task clustering, and finally we propose a cloud-native model based on microservices comprising auto-scalable worker-pools. We implement the proposed models in the HyperFlow workflow management system, and evaluate them using a large Montage workflow on a Kubernetes cluster. The results indicate that the proposed cloud-native worker-pools execution model achieves best performance in terms of average cluster utilization, resulting in a nearly 20\% improvement of the workflow makespan compared to the best-performing job-based model. However, better performance comes at the cost of significantly higher complexity of the implementation and maintenance. We believe that our experiments provide a valuable insight into the performance, advantages and disadvantages of alternative cloud-native execution models for scientific workflows.

Towards cloud-native scientific workflow management

TL;DR

Abstract

Paper Structure (14 sections, 6 figures, 1 table)

This paper contains 14 sections, 6 figures, 1 table.

Introduction
Related Work
Alternative execution models for scientific workflows on Kubernetes
Scheduling and auto-scaling in Kubernetes
Job-based execution model
Worker Pools execution model
Challenges for scientific workflow execution
Implementation in Hyperflow
Experiments
Experiment setup
The job model
The job model with task clustering
The worker pools model
Conclusions and Future Work

Figures (6)

Figure 1: Job-based execution model for scientific workflow in Kubernetes. Each task of the workflow is executed as a separate Kubernetes Job.
Figure 2: Worker Pool execution model for scientific workflows in Kubernetes. For each task type in the workflow, a separate deployment is created with associated Pods acting as workers for workflow tasks. The Horizontal Pod Autoscaler scales the deployment up and down by creating Pod replicas, based on the current load.
Figure 3: Execution of the experimental workflow -- the job model.
Figure 4: Execution of the experimental workflow -- the job model with task clustering. The subplot shows cluster utilization -- the number of workflow tasks executing in parallel at any given time.
Figure 5: Executions of the experimental workflow -- the job model with task clustering with various clustering parameters.
...and 1 more figures

Towards cloud-native scientific workflow management

TL;DR

Abstract

Towards cloud-native scientific workflow management

Authors

TL;DR

Abstract

Table of Contents

Figures (6)