Table of Contents
Fetching ...

Predicting Dynamic Memory Requirements for Scientific Workflow Tasks

Jonathan Bader, Nils Diedrich, Lauritz Thamsen, Odej Kao

TL;DR

This paper proposes a novel online method that uses monitoring time series data to predict task memory usage and divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size.

Abstract

With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the target cluster infrastructure. Crucially, underestimating a task's memory requirements can result in task failures. Therefore, users often resort to overprovisioning, resulting in significant resource wastage and decreased throughput. In this paper, we propose a novel online method that uses monitoring time series data to predict task memory usage in order to reduce the memory wastage of scientific workflow tasks. Our method predicts a task's runtime, divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size. We evaluate the prototype implementation of our method using workflows from the publicly available nf-core repository, showing an average memory wastage reduction of 29.48% compared to the best state-of-the-art approach.

Predicting Dynamic Memory Requirements for Scientific Workflow Tasks

TL;DR

This paper proposes a novel online method that uses monitoring time series data to predict task memory usage and divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size.

Abstract

With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the target cluster infrastructure. Crucially, underestimating a task's memory requirements can result in task failures. Therefore, users often resort to overprovisioning, resulting in significant resource wastage and decreased throughput. In this paper, we propose a novel online method that uses monitoring time series data to predict task memory usage in order to reduce the memory wastage of scientific workflow tasks. Our method predicts a task's runtime, divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size. We evaluate the prototype implementation of our method using workflows from the publicly available nf-core repository, showing an average memory wastage reduction of 29.48% compared to the best state-of-the-art approach.
Paper Structure (16 sections, 2 equations, 8 figures)

This paper contains 16 sections, 2 equations, 8 figures.

Figures (8)

  • Figure 1: The figure shows a task's memory usage over time, the optimal, under-, and over-allocation when predicting a single peak memory value, as well as the associated optimization potential.
  • Figure 2: The figure provides a high-level overview of our k-Segments method (green) as applied in a scientific workflow environment. The model of our method learns time-dependent memory allocations by underpredicting the runtime, dividing a task's time series into $k$ equally distributed segments, learning the peak memory value per segment, and offsetting it. The predictions are updated during the runtime of a workflow execution and provided to the workflow management engine.
  • Figure 3: The figure shows the prediction model creation steps divided into runtime prediction and memory prediction
  • Figure 4: Example of applying our k-Segments method to the adapter removal task with $k = 4$.
  • Figure 5: The figure illustrates the selective and partial retry strategies and their effect on a retried task execution.
  • ...and 3 more figures