Table of Contents
Fetching ...

ASA -- The Adaptive Scheduling Algorithm

Abel Souza, Kristiaan Pelckmans, Devarshi Ghoshal, Lavanya Ramakrishnan, Johan Tordsson

TL;DR

The paper tackles prolonged queue waits in HPC batch systems for data-intensive scientific workflows by introducing ASA, an adaptive scheduling algorithm that learns queue waiting times online and proactively submits resource changes to reduce inter-stage waiting. ASA uses a reinforcement-learning-inspired, convergence-proven framework that maintains a distribution over a fixed set of waiting-time alternatives and updates it as workflow stages execute. Real-world experiments across two supercomputers and three representative workflows show ASA achieving near-optimal resource utilization while delivering substantial reductions in average workflow queue waiting times (up to about 10%) and makespan (around 2%), demonstrating robust performance under queue workload variability. The proposed Mesos-based Unified View and proactive scheduling library enable WMS to operate over a global resource pool, offering fault tolerance and elasticity while maintaining workflow ordering and QoS constraints, with promising implications for scalable, low-latency scientific data processing.

Abstract

In High Performance Computing (HPC) infrastructures, the control of resources by batch systems can lead to prolonged queue waiting times and adverse effects on the overall execution times of applications, particularly in data-intensive and low-latency workflows where efficient processing hinges on resource planning and timely allocation. Allocating the maximum capacity upfront ensures the fastest execution but results in spare and idle resources, extended queue waits, and costly usage. Conversely, dynamic allocation based on workflow stage requirements optimizes resource usage but may negatively impact the total workflow makespan. To address these issues, we introduce ASA, the Adaptive Scheduling Algorithm. ASA is a novel, convergence-proven scheduling technique that minimizes jobs inter-stage waiting times by estimating the queue waiting times to proactively submit resource change requests ahead of time. It strikes a balance between exploration and exploitation, considering both learning (waiting times) and applying learnt insights. Real-world experiments over two supercomputers centers with scientific workflows demonstrate ASA's effectiveness, achieving near-optimal resource utilization and accuracy, with up to 10% and 2% reductions in average workflow queue waiting times and makespan, respectively.

ASA -- The Adaptive Scheduling Algorithm

TL;DR

The paper tackles prolonged queue waits in HPC batch systems for data-intensive scientific workflows by introducing ASA, an adaptive scheduling algorithm that learns queue waiting times online and proactively submits resource changes to reduce inter-stage waiting. ASA uses a reinforcement-learning-inspired, convergence-proven framework that maintains a distribution over a fixed set of waiting-time alternatives and updates it as workflow stages execute. Real-world experiments across two supercomputers and three representative workflows show ASA achieving near-optimal resource utilization while delivering substantial reductions in average workflow queue waiting times (up to about 10%) and makespan (around 2%), demonstrating robust performance under queue workload variability. The proposed Mesos-based Unified View and proactive scheduling library enable WMS to operate over a global resource pool, offering fault tolerance and elasticity while maintaining workflow ordering and QoS constraints, with promising implications for scalable, low-latency scientific data processing.

Abstract

In High Performance Computing (HPC) infrastructures, the control of resources by batch systems can lead to prolonged queue waiting times and adverse effects on the overall execution times of applications, particularly in data-intensive and low-latency workflows where efficient processing hinges on resource planning and timely allocation. Allocating the maximum capacity upfront ensures the fastest execution but results in spare and idle resources, extended queue waits, and costly usage. Conversely, dynamic allocation based on workflow stage requirements optimizes resource usage but may negatively impact the total workflow makespan. To address these issues, we introduce ASA, the Adaptive Scheduling Algorithm. ASA is a novel, convergence-proven scheduling technique that minimizes jobs inter-stage waiting times by estimating the queue waiting times to proactively submit resource change requests ahead of time. It strikes a balance between exploration and exploitation, considering both learning (waiting times) and applying learnt insights. Real-world experiments over two supercomputers centers with scientific workflows demonstrate ASA's effectiveness, achieving near-optimal resource utilization and accuracy, with up to 10% and 2% reductions in average workflow queue waiting times and makespan, respectively.
Paper Structure (20 sections, 1 theorem, 4 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 4 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $\theta=(\theta_1, \dots, \theta_m)\in\mathbb{R}^m$ be a fixed, given collection of waiting time alternatives amongst which to choose. Let the ASA algorithm run on a sequence of $t$ processes, and let $\eta(t)$ denote the number of mini-batches created by the algorithm as of time $t$. Then for a

Figures (9)

  • Figure 1: Excerpt from the Montage scientific workflow, an image mosaic software employed by NASA berriman2004montage. Different colors in the graph represent distinct sets of tasks within a stage. Outputs generated in each stage serve as inputs for subsequent stages, ultimately culminating in the final result.
  • Figure 2: (a) Big Job vs (b) Per-Stage managed resource allocation strategies in HPC. Fig. \ref{['fig:bigjob']}: an unique allocation for the entire workflow duration, with single queue waiting time. Fig. \ref{['fig:per_stage']}: per-stage allocations with only as many resources as required by a particular stage, with extra inter-stage queue waiting times. Note the differences in makespan and resources charging in each case (summation of area(s) under the dashed red lines).
  • Figure 3: ASA - Architecture managing the physical resources. Tasks (the different shapes in the partitions) from different jobs can access resources from multiple jobs. The unified view layer enables users to apply different scheduling strategies, such as pro-active job submissions.
  • Figure 4: ASA - Algorithm workflow illustrating two concurrent pro-active submissions (2 and 3) within ongoing stages. Note the per-staged charging and lower workflow makespan.
  • Figure 5: ASA's estimation convergence over time regarding queue waiting time (dark dashed blue line) with three different sampling policies: Greedy (red dotted line), ASA's default (black line), and ASA tuned (light pink line).
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 1