Table of Contents
Fetching ...

Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures

Svetlana Kulagina, Anne Benoit, Henning Meyerhenke

TL;DR

This work tackles memory-constrained scheduling of DAG-structured scientific workflows on heterogeneous hardware. It extends HEFT with memory-aware variants (HEFTM-BL, HEFTM-BLC, HEFTM-MM) that incorporate per-processor memory, eviction to communication buffers, and on-the-fly schedule adaptation when task parameters deviate. The approach is evaluated on real-world and synthetic workflows, showing that memory-aware heuristics enable valid schedules where standard HEFT fails, while dynamic recomputation improves robustness and can significantly reduce makespan in challenging environments. The results highlight a trade-off between memory efficiency and scheduling speed, with HEFTM-MM achieving superior memory usage and validity under tight constraints, offering practical benefits for large-scale, data-intensive scientific pipelines.

Abstract

The analysis of massive scientific data often happens in the form of workflows with interdependent tasks. When such a scientific workflow needs to be scheduled on a parallel or distributed system, one usually represents the workflow as a directed acyclic graph (DAG). The vertices of the DAG represent the tasks, while its edges model the dependencies between the tasks (data to be communicated to successor tasks). When executed, each task requires a certain amount of memory and if it exceeds the available memory, the execution fails. The typical goal is to execute the workflow without failures (satisfying the memory constraints) and with the shortest possible execution time (minimize its makespan). To address this problem, we investigate the memory-aware scheduling of DAG-shaped workflows on heterogeneous platforms, where each processor can have a different speed and a different memory size. We propose a variant of HEFT (Heterogeneous Earliest Finish Time) that accounts for memory and includes eviction strategies for cases when it might be beneficial to remove some data from memory in order to have enough memory to execute other tasks. Furthermore, while HEFT assumes perfect knowledge of the execution time and memory usage of each task, the actual values might differ upon execution. Thus, we propose an adaptive scheduling strategy, where a schedule is recomputed when there has been a significant variation in terms of execution time or memory. The scheduler has been integrated with a runtime system, allowing us to perform a thorough experimental evaluation on real-world workflows. The runtime system warns the scheduler when the task parameters change, so a schedule is recomputed on the fly. The memory-aware strategy allows us to schedule task graphs that would run out of memory with a state-of-the-art scheduler, and the adaptive setting allows us to significantly reduce the makespan.

Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures

TL;DR

This work tackles memory-constrained scheduling of DAG-structured scientific workflows on heterogeneous hardware. It extends HEFT with memory-aware variants (HEFTM-BL, HEFTM-BLC, HEFTM-MM) that incorporate per-processor memory, eviction to communication buffers, and on-the-fly schedule adaptation when task parameters deviate. The approach is evaluated on real-world and synthetic workflows, showing that memory-aware heuristics enable valid schedules where standard HEFT fails, while dynamic recomputation improves robustness and can significantly reduce makespan in challenging environments. The results highlight a trade-off between memory efficiency and scheduling speed, with HEFTM-MM achieving superior memory usage and validity under tight constraints, offering practical benefits for large-scale, data-intensive scientific pipelines.

Abstract

The analysis of massive scientific data often happens in the form of workflows with interdependent tasks. When such a scientific workflow needs to be scheduled on a parallel or distributed system, one usually represents the workflow as a directed acyclic graph (DAG). The vertices of the DAG represent the tasks, while its edges model the dependencies between the tasks (data to be communicated to successor tasks). When executed, each task requires a certain amount of memory and if it exceeds the available memory, the execution fails. The typical goal is to execute the workflow without failures (satisfying the memory constraints) and with the shortest possible execution time (minimize its makespan). To address this problem, we investigate the memory-aware scheduling of DAG-shaped workflows on heterogeneous platforms, where each processor can have a different speed and a different memory size. We propose a variant of HEFT (Heterogeneous Earliest Finish Time) that accounts for memory and includes eviction strategies for cases when it might be beneficial to remove some data from memory in order to have enough memory to execute other tasks. Furthermore, while HEFT assumes perfect knowledge of the execution time and memory usage of each task, the actual values might differ upon execution. Thus, we propose an adaptive scheduling strategy, where a schedule is recomputed when there has been a significant variation in terms of execution time or memory. The scheduler has been integrated with a runtime system, allowing us to perform a thorough experimental evaluation on real-world workflows. The runtime system warns the scheduler when the task parameters change, so a schedule is recomputed on the fly. The memory-aware strategy allows us to schedule task graphs that would run out of memory with a state-of-the-art scheduler, and the adaptive setting allows us to significantly reduce the makespan.

Paper Structure

This paper contains 26 sections, 5 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Success rates by workflow size and algorithm on the default cluster. Higher is better.
  • Figure 2: Relative makespans of heuristics normalized by HEFT makespan, by workflow size, on default cluster. Smaller is better.
  • Figure 3: Memory usage on default cluster, including invalid HEFT schedules.
  • Figure 4: Memory usage on default cluster, considering only valid HEFT schedules.
  • Figure 5: Success rates on the memory-constrained cluster. Higher is better.
  • ...and 4 more figures