Table of Contents
Fetching ...

Minimizing Energy in Reliability and Deadline-Ensured Workflow Scheduling in Cloud

Suvarthi Sarkar, Dhanesh V, Ketan Singh, Aryabartta Sahu

TL;DR

This work tackles the problem of minimizing energy in reliability- and deadline-constrained cloud workflows with data-dependent execution times. It introduces an adaptive strategy based on Maximum Fan-Out Ratio (MFR) that selects between Largest Energy First (LEF) and Level-based Deadline Distribution (LDD) under a rolling-horizon framework (Roll-H), with a dynamic variant (Dy) that continuously revises schedules as tasks complete early. The approach builds explicit energy and reliability models, including E(W) = ∑_{j,l,k} x_{j,l,k} E(t_j,f_k^{VM_l}) and R(W) as a product of per-task reliabilities, enhanced by a primary/backup replication scheme to meet hard reliability constraints. Empirical results show the static approach outperforming SOTA by up to 70% (non-deadline) and ~2% (deadline), while the dynamic variant surpasses SOTA by up to 82% (non-deadline) and ~27% (deadline); relative to the static optimum, the static method is within 1.1× and the dynamic approach is about 25% better on average. Collectively, the findings demonstrate that adaptive, real-time energy optimization under deadline and reliability constraints can yield substantial energy savings in cloud workflow management with practical implications for energy-aware providers.

Abstract

With the increasing prevalence of computationally intensive workflows in cloud environments, it has become crucial for cloud platforms to optimize energy consumption while ensuring the feasibility of user workflow schedules with respect to strict deadlines and reliability constraints. The key challenges faced when cloud systems provide virtual machines of varying levels of reliability, energy consumption, processing frequencies, and computing capabilities to execute tasks of these workflows. To address these issues, we propose an adaptive strategy based on maximum fan-out ratio considering the slack of tasks and deadline distribution for scheduling workflows in a single cloud platform, intending to minimise energy consumption while ensuring strict reliability and deadline constraints. We also propose an approach for dynamic scheduling of workflow using the rolling horizon concept to consider the dynamic execution time of tasks of the workflow where the actual task execution time at run time is shorter than worst-case execution time in most of the cases. Our proposed static approach outperforms the state-of-the-art (SOTA) by up to 70% on average in scenarios without deadline constraints, and achieves an improvement of approximately 2% in deadline-constrained cases. The dynamic variant of our approach demonstrates even stronger performance, surpassing SOTA by 82% in non-deadline scenarios and by up to 27% on average when deadline constraints are enforced. Furthermore, in comparison with the static optimal solution, our static approach yields results within a factor of 1.1, while the dynamic approach surpasses the optimal baseline by an average of 25%.

Minimizing Energy in Reliability and Deadline-Ensured Workflow Scheduling in Cloud

TL;DR

This work tackles the problem of minimizing energy in reliability- and deadline-constrained cloud workflows with data-dependent execution times. It introduces an adaptive strategy based on Maximum Fan-Out Ratio (MFR) that selects between Largest Energy First (LEF) and Level-based Deadline Distribution (LDD) under a rolling-horizon framework (Roll-H), with a dynamic variant (Dy) that continuously revises schedules as tasks complete early. The approach builds explicit energy and reliability models, including E(W) = ∑_{j,l,k} x_{j,l,k} E(t_j,f_k^{VM_l}) and R(W) as a product of per-task reliabilities, enhanced by a primary/backup replication scheme to meet hard reliability constraints. Empirical results show the static approach outperforming SOTA by up to 70% (non-deadline) and ~2% (deadline), while the dynamic variant surpasses SOTA by up to 82% (non-deadline) and ~27% (deadline); relative to the static optimum, the static method is within 1.1× and the dynamic approach is about 25% better on average. Collectively, the findings demonstrate that adaptive, real-time energy optimization under deadline and reliability constraints can yield substantial energy savings in cloud workflow management with practical implications for energy-aware providers.

Abstract

With the increasing prevalence of computationally intensive workflows in cloud environments, it has become crucial for cloud platforms to optimize energy consumption while ensuring the feasibility of user workflow schedules with respect to strict deadlines and reliability constraints. The key challenges faced when cloud systems provide virtual machines of varying levels of reliability, energy consumption, processing frequencies, and computing capabilities to execute tasks of these workflows. To address these issues, we propose an adaptive strategy based on maximum fan-out ratio considering the slack of tasks and deadline distribution for scheduling workflows in a single cloud platform, intending to minimise energy consumption while ensuring strict reliability and deadline constraints. We also propose an approach for dynamic scheduling of workflow using the rolling horizon concept to consider the dynamic execution time of tasks of the workflow where the actual task execution time at run time is shorter than worst-case execution time in most of the cases. Our proposed static approach outperforms the state-of-the-art (SOTA) by up to 70% on average in scenarios without deadline constraints, and achieves an improvement of approximately 2% in deadline-constrained cases. The dynamic variant of our approach demonstrates even stronger performance, surpassing SOTA by 82% in non-deadline scenarios and by up to 27% on average when deadline constraints are enforced. Furthermore, in comparison with the static optimal solution, our static approach yields results within a factor of 1.1, while the dynamic approach surpasses the optimal baseline by an average of 25%.

Paper Structure

This paper contains 34 sections, 18 equations, 9 figures, 1 table, 5 algorithms.

Figures (9)

  • Figure 1: Users submit workflows to a cloud resource provider $\mathcal{C}$. $\mathcal{C}$ has access to an unlimited pool of Physical Machines (PMs), each capable of hosting virtual machines (VMs) of any type from a predefined set of $L$ VM types.
  • Figure 2: Central Scheduler Architecture. The components enclosed within the dotted line represent dynamic events that occur at runtime and are written within colour. System-related information is indicated using colour. Steps handled by our proposed approach are marked with , while operations managed by the cloud platform are denoted by .
  • Figure 3: Subfigure A and B, are two workflows to compare LEF and LDD. Subfigure C shows a bottleneck task $t_b$
  • Figure 4: Comparison of total energy consumption across workflow benchmarks against REMSM-DVFSsota_deadline. (deadline-based SOTA). Our proposed approach is best between LEF and LDD. denotes that $R_w$ is met, while indicates otherwise. Our proposed approach is best between LEF and LDD. Gurobi solver did not converge for the Inspiral workflow.
  • Figure 5: Comparison of total energy consumption across workflow benchmarks against REWS sota_rel (non deadline-based SOTA). Our proposed approach is best between LEF and LDD.
  • ...and 4 more figures