Table of Contents
Fetching ...

Adaptive Resource Allocation for Workflow Containerization on Kubernetes

Chenggang Shan, Chuge Wu, Yuanqing Xia, Zehua Guo, Danyang Liu, Jinhui Zhang

TL;DR

ARAS introduces an adaptive resource allocation scheme for Kubernetes-based workflow engines, integrating with a tailored KubeAdaptor via the MAPE-K model to handle continuous workflow requests and resource spikes. It combines a Resource Discovery module, a Resource Evaluator, and a vertical autoscaling-based Allocator to maximize memory-enabled utilization while respecting per-task and per-node constraints, using proportional scaling rules $cpu_{cut}$ and $mem_{cut}$ guided by $totalResidual$ and $request$. Experimental evaluation across Montage, Epigenomics, CyberShake, and LIGO workflows under three arrival patterns shows up to 40.92% reduction in total workflow duration and up to 79.86% reduction in per-workflow duration, with 1%–16% increases in CPU/memory utilization, validating both efficiency and tighter resource usage. The work demonstrates self-healing and self-configuration capabilities via OOMKilled recovery and dynamic reallocation, and points to future work on deep reinforcement learning and cloud-edge collaboration for resource provisioning in cloud-native workflow systems.

Abstract

In a cloud-native era, the Kubernetes-based workflow engine enables workflow containerized execution through the inherent abilities of Kubernetes. However, when encountering continuous workflow requests and unexpected resource request spikes, the engine is limited to the current workflow load information for resource allocation, which lacks the agility and predictability of resource allocation, resulting in over and under-provisioning resources. This mechanism seriously hinders workflow execution efficiency and leads to high resource waste. To overcome these drawbacks, we propose an adaptive resource allocation scheme named ARAS for the Kubernetes-based workflow engines. Considering potential future workflow task requests within the current task pod's lifecycle, the ARAS uses a resource scaling strategy to allocate resources in response to high-concurrency workflow scenarios. The ARAS offers resource discovery, resource evaluation, and allocation functionalities and serves as a key component for our tailored workflow engine (KubeAdaptor). By integrating the ARAS into KubeAdaptor for workflow containerized execution, we demonstrate the practical abilities of KubeAdaptor and the advantages of our ARAS. Compared with the baseline algorithm, experimental evaluation under three distinct workflow arrival patterns shows that ARAS gains time-saving of 9.8% to 40.92% in the average total duration of all workflows, time-saving of 26.4% to 79.86% in the average duration of individual workflow, and an increase of 1% to 16% in CPU and memory resource usage rate.

Adaptive Resource Allocation for Workflow Containerization on Kubernetes

TL;DR

ARAS introduces an adaptive resource allocation scheme for Kubernetes-based workflow engines, integrating with a tailored KubeAdaptor via the MAPE-K model to handle continuous workflow requests and resource spikes. It combines a Resource Discovery module, a Resource Evaluator, and a vertical autoscaling-based Allocator to maximize memory-enabled utilization while respecting per-task and per-node constraints, using proportional scaling rules and guided by and . Experimental evaluation across Montage, Epigenomics, CyberShake, and LIGO workflows under three arrival patterns shows up to 40.92% reduction in total workflow duration and up to 79.86% reduction in per-workflow duration, with 1%–16% increases in CPU/memory utilization, validating both efficiency and tighter resource usage. The work demonstrates self-healing and self-configuration capabilities via OOMKilled recovery and dynamic reallocation, and points to future work on deep reinforcement learning and cloud-edge collaboration for resource provisioning in cloud-native workflow systems.

Abstract

In a cloud-native era, the Kubernetes-based workflow engine enables workflow containerized execution through the inherent abilities of Kubernetes. However, when encountering continuous workflow requests and unexpected resource request spikes, the engine is limited to the current workflow load information for resource allocation, which lacks the agility and predictability of resource allocation, resulting in over and under-provisioning resources. This mechanism seriously hinders workflow execution efficiency and leads to high resource waste. To overcome these drawbacks, we propose an adaptive resource allocation scheme named ARAS for the Kubernetes-based workflow engines. Considering potential future workflow task requests within the current task pod's lifecycle, the ARAS uses a resource scaling strategy to allocate resources in response to high-concurrency workflow scenarios. The ARAS offers resource discovery, resource evaluation, and allocation functionalities and serves as a key component for our tailored workflow engine (KubeAdaptor). By integrating the ARAS into KubeAdaptor for workflow containerized execution, we demonstrate the practical abilities of KubeAdaptor and the advantages of our ARAS. Compared with the baseline algorithm, experimental evaluation under three distinct workflow arrival patterns shows that ARAS gains time-saving of 9.8% to 40.92% in the average total duration of all workflows, time-saving of 26.4% to 79.86% in the average duration of individual workflow, and an increase of 1% to 16% in CPU and memory resource usage rate.
Paper Structure (29 sections, 10 equations, 9 figures, 5 tables)

This paper contains 29 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Resource allocation example. A small-scale Montage workflow with $21$ tasks is used to illustrate the resource scaling method in our ARAS. The test environment uses our experimental setup in Section \ref{['sec:senario']}.
  • Figure 2: KubeAdaptor architecture.
  • Figure 3: Resource allocation scheme based on MAPE-K model.
  • Figure 4: The topology diagram of four scientific workflow applications.
  • Figure 5: The CPU and memory resource usage rate under three distinct arrival patterns for Montage workflows.
  • ...and 4 more figures