Table of Contents
Fetching ...

Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms

Svetlana Kulagina, Henning Meyerhenke, Anne Benoit

TL;DR

This work investigates the partitioning and mapping of DAG-shaped workflows onto heterogeneous platforms where each processor can have a different speed and a different memory size and presents a four-step heuristic to exploit the heterogeneity.

Abstract

Scientific workflows are often represented as directed acyclic graphs (DAGs), where vertices correspond to tasks and edges represent the dependencies between them. Since these graphs are often large in both the number of tasks and their resource requirements, it is important to schedule them efficiently on parallel or distributed compute systems. Typically, each task requires a certain amount of memory to be executed and needs to communicate data to its successor tasks. The goal is thus to execute the workflow as fast as possible (i.e., to minimize its makespan) while satisfying the memory constraints. Hence, we investigate the partitioning and mapping of DAG-shaped workflows onto heterogeneous platforms where each processor can have a different speed and a different memory size. We first propose a baseline algorithm in the absence of existing memory-aware solutions. As our main contribution, we then present a four-step heuristic. Its first step is to partition the input DAG into smaller blocks with an existing DAG partitioner. The next two steps adapt the resulting blocks of the DAG to fit the processor memories and optimize for the overall makespan by further splitting and merging these blocks. Finally, we use local search via block swaps to further improve the makespan. Our experimental evaluation on real-world and simulated workflows with up to 30,000 tasks shows that exploiting the heterogeneity with the four-step heuristic reduces the makespan by a factor of 2.44 on average (even more on large workflows), compared to the baseline that ignores heterogeneity.

Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms

TL;DR

This work investigates the partitioning and mapping of DAG-shaped workflows onto heterogeneous platforms where each processor can have a different speed and a different memory size and presents a four-step heuristic to exploit the heterogeneity.

Abstract

Scientific workflows are often represented as directed acyclic graphs (DAGs), where vertices correspond to tasks and edges represent the dependencies between them. Since these graphs are often large in both the number of tasks and their resource requirements, it is important to schedule them efficiently on parallel or distributed compute systems. Typically, each task requires a certain amount of memory to be executed and needs to communicate data to its successor tasks. The goal is thus to execute the workflow as fast as possible (i.e., to minimize its makespan) while satisfying the memory constraints. Hence, we investigate the partitioning and mapping of DAG-shaped workflows onto heterogeneous platforms where each processor can have a different speed and a different memory size. We first propose a baseline algorithm in the absence of existing memory-aware solutions. As our main contribution, we then present a four-step heuristic. Its first step is to partition the input DAG into smaller blocks with an existing DAG partitioner. The next two steps adapt the resulting blocks of the DAG to fit the processor memories and optimize for the overall makespan by further splitting and merging these blocks. Finally, we use local search via block swaps to further improve the makespan. Our experimental evaluation on real-world and simulated workflows with up to 30,000 tasks shows that exploiting the heterogeneity with the four-step heuristic reduces the makespan by a factor of 2.44 on average (even more on large workflows), compared to the baseline that ignores heterogeneity.
Paper Structure (26 sections, 4 equations, 9 figures, 4 tables, 5 algorithms)

This paper contains 26 sections, 4 equations, 9 figures, 4 tables, 5 algorithms.

Figures (9)

  • Figure 1: An example graph $G$, its possible acyclic partition $\mathcal{F}$ into four blocks, and a resulting quotient graph $\Gamma$.
  • Figure 2: Merging two vertices can create a cycle of length 2. Merging all three vertices may solve the problem.
  • Figure 3: Left: Relative makespan (in %) of DagHetPart compared to DagHetMem on default cluster. Right: Relative makespan (in %) on different cluster sizes ($x$-axis: number of CPUs), by workflow size. Smaller is better.
  • Figure 4: Relative (left, baseline: DagHetMem) and absolute (right) makespan of DagHetPart for different levels of heterogeneity. Smaller is better.
  • Figure 5: Makespan of DagHetPart relative to DagHetMem for different workflow families. Dotted lines are meant to improve readability.
  • ...and 4 more figures