Table of Contents
Fetching ...

Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows

Fabian Lehmann, Jonathan Bader, Ninon De Mecquenem, Xing Wang, Vasilis Bountris, Florian Friederici, Ulf Leser, Lauritz Thamsen

TL;DR

Ponder is presented, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns, and improves Memory Allocation Quality and makespan by 21.8% in comparison to a state-of-the-art method.

Abstract

Scientific workflows are used to analyze large amounts of data. These workflows comprise numerous tasks, many of which are executed repeatedly, running the same custom program on different inputs. Users specify resource allocations for each task, which must be sufficient for all inputs to prevent task failures. As a result, task memory allocations tend to be overly conservative, wasting precious cluster resources, limiting overall parallelism, and increasing workflow makespan. In this paper, we first benchmark a state-of-the-art method on four real-life workflows from the nf-core workflow repository. This analysis reveals that certain assumptions underlying current prediction methods, which typically were evaluated only on simulated workflows, cannot generally be confirmed for real workflows and executions. We then present Ponder, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns. We implemented Ponder for Nextflow and made the code publicly available. In an experimental evaluation that also considers the impact of memory predictions on scheduling, Ponder improves Memory Allocation Quality on average by 71.0% and makespan by 21.8% in comparison to a state-of-the-art method. Moreover, Ponder produces 93.8% fewer task failures.

Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows

TL;DR

Ponder is presented, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns, and improves Memory Allocation Quality and makespan by 21.8% in comparison to a state-of-the-art method.

Abstract

Scientific workflows are used to analyze large amounts of data. These workflows comprise numerous tasks, many of which are executed repeatedly, running the same custom program on different inputs. Users specify resource allocations for each task, which must be sufficient for all inputs to prevent task failures. As a result, task memory allocations tend to be overly conservative, wasting precious cluster resources, limiting overall parallelism, and increasing workflow makespan. In this paper, we first benchmark a state-of-the-art method on four real-life workflows from the nf-core workflow repository. This analysis reveals that certain assumptions underlying current prediction methods, which typically were evaluated only on simulated workflows, cannot generally be confirmed for real workflows and executions. We then present Ponder, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns. We implemented Ponder for Nextflow and made the code publicly available. In an experimental evaluation that also considers the impact of memory predictions on scheduling, Ponder improves Memory Allocation Quality on average by 71.0% and makespan by 21.8% in comparison to a state-of-the-art method. Moreover, Ponder produces 93.8% fewer task failures.
Paper Structure (22 sections, 2 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 2 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: An abstract DAG with five tasks and a matching physical DAG with six physical tasks; circles are tasks, and arrows are dependencies.
  • Figure 2: Memory consumption depending on the input data for four exemplary tasks. In red, a linear regression, and in green, the regression shifted by the standard deviation between the predicted and the true value. The dashed blue line represents the 95th percentile.
  • Figure 3: Cumulative Distribution of user-defined and actual used memory per core assigned to a task. Probability of all physical tasks.
  • Figure 4: Cumulative distribution of peak memory difference between two runs for the same task in MB
  • Figure 5: RNA-Seq coverage analysis for Human and Drosophila input data
  • ...and 3 more figures