Table of Contents
Fetching ...

Design Principles of Dynamic Resource Management for High-Performance Parallel Programming Models

Dominik Huber, Martin Schreiber, Martin Schulz, Howard Pritchard, Daniel Holmes

TL;DR

Based on a survey of existing approaches, design principles are proposed that form the basis of a holistic approach to DMR in HPC and a prototype implementation using MPI is provided.

Abstract

With Dynamic Resource Management (DRM) the resources assigned to a job can be changed dynamically during its execution. From the system's perspective, DRM opens a new level of flexibility in resource allocation and job scheduling and therefore has the potential to improve system efficiency metrics such as the utilization rate, job throughput, energy efficiency, and responsiveness. From the application perspective, users can tailor the resources they request to their needs offering potential optimizations in queuing time or charged costs. Despite these obvious advantages and many attempts over the last decade to establish DRM in HPC, it remains a concept discussed in academia rather than being successfully deployed on production systems. This stems from the fact that support for DRM requires changes in all the layers of the HPC system software stack including applications, programming models, process managers, and resource management software, as well as an extensive and holistic co-design process to establish new techniques and policies for scheduling and resource optimization. In this work, we therefore start with the assumption that resources are accessible by processes executed either on them (e.g., on CPU) or controlling them (e.g., GPU-offloading). Then, the overall DRM problem can be decomposed into dynamic process management (DPM) and dynamic resource mapping or allocation (DRA). The former determines which processes (or which change in processes) must be managed and the latter identifies the resources where they will be executed. The interfaces for such \mbox{DPM/DPA} in these layers need to be standardized, which requires a careful design to be interoperable while providing high flexibility. Based on a survey of existing approaches we propose design principles, that form the basis of a holistic approach to DMR in HPC and provide a prototype implementation using MPI.

Design Principles of Dynamic Resource Management for High-Performance Parallel Programming Models

TL;DR

Based on a survey of existing approaches, design principles are proposed that form the basis of a holistic approach to DMR in HPC and a prototype implementation using MPI is provided.

Abstract

With Dynamic Resource Management (DRM) the resources assigned to a job can be changed dynamically during its execution. From the system's perspective, DRM opens a new level of flexibility in resource allocation and job scheduling and therefore has the potential to improve system efficiency metrics such as the utilization rate, job throughput, energy efficiency, and responsiveness. From the application perspective, users can tailor the resources they request to their needs offering potential optimizations in queuing time or charged costs. Despite these obvious advantages and many attempts over the last decade to establish DRM in HPC, it remains a concept discussed in academia rather than being successfully deployed on production systems. This stems from the fact that support for DRM requires changes in all the layers of the HPC system software stack including applications, programming models, process managers, and resource management software, as well as an extensive and holistic co-design process to establish new techniques and policies for scheduling and resource optimization. In this work, we therefore start with the assumption that resources are accessible by processes executed either on them (e.g., on CPU) or controlling them (e.g., GPU-offloading). Then, the overall DRM problem can be decomposed into dynamic process management (DPM) and dynamic resource mapping or allocation (DRA). The former determines which processes (or which change in processes) must be managed and the latter identifies the resources where they will be executed. The interfaces for such \mbox{DPM/DPA} in these layers need to be standardized, which requires a careful design to be interoperable while providing high flexibility. Based on a survey of existing approaches we propose design principles, that form the basis of a holistic approach to DMR in HPC and provide a prototype implementation using MPI.
Paper Structure (39 sections, 3 figures)

This paper contains 39 sections, 3 figures.

Figures (3)

  • Figure 1: Model for the relations between execution concepts, resources and resource management software on HPC systems. The access to HPC resources (bottom) is controlled by the resource manager (top). Applications (right) require access to HPC resources to execute their workload. Applications access resources through application processes, which are typically abstracted by a parallel programming model and exposed to the application (directly or through application libraries) via a parallel programming interface. Processes are managed by the process manager (left) and exposed to the parallel programming model by the parallel runtime environment. Dynamic resource management requires coordination between applications and the resource manager throughout multiple software layers: The programming model (interface), the parallel runtime environment and the process manager.
  • Figure 2: Design principles for a dynamic resource allocation interface.
  • Figure 3: Illustration of set operations in MPI. The figure shows an example of two set operations, GROW and ADD advocated by the application via calls to . In both calls the same input PSet was specified, i.e. PSet P1, and for each operation a COL object has been passed to provide optimization information. The call returns an MPI Request, which can be used in a call to or to check if the operation has been executed, or passed to a subsequent call to update the optimization information. In this example the GROW operation has already been executed by the resource manager, i.e. the output PSets have been created and the operation is now pending. Pending operations can be queried by all processes using until an explicit call to . Each PSet has an associated data store where data can be stored and retrieved via the . This example could for instance represent a job where the processes in PSet P1 run a simulation. Processes could be added dynamically to the simulation (GROW) or started separately to run some in-situ task (ADD).