Table of Contents
Fetching ...

Kub: Enabling Elastic HPC Workloads on Containerized Environments

Daniel Medeiros, Jacob Wahlgren, Gabin Schieffer, Ivy Peng

TL;DR

Kub, a methodology that enables elastic execution of HPC workloads on Kubernetes so that the resources allocated to a job can be dynamically scaled during the execution, is presented.

Abstract

The conventional model of resource allocation in HPC systems is static. Thus, a job cannot leverage newly available resources in the system or release underutilized resources during the execution. In this paper, we present Kub, a methodology that enables elastic execution of HPC workloads on Kubernetes so that the resources allocated to a job can be dynamically scaled during the execution. One main optimization of our method is to maximize the reuse of the originally allocated resources so that the disruption to the running job can be minimized. The scaling procedure is coordinated among nodes through remote procedure calls on Kubernetes for deploying workloads in the cloud. We evaluate our approach using one synthetic benchmark and two production-level MPI-based HPC applications -- GROMACS and CM1. Our results demonstrate that the benefits of adapting the allocated resources depend on the workload characteristics. In the tested cases, a properly chosen scaling point for increasing resources during execution achieved up to 2x speedup. Also, the overhead of checkpointing and data reshuffling significantly influences the selection of optimal scaling points and requires application-specific knowledge.

Kub: Enabling Elastic HPC Workloads on Containerized Environments

TL;DR

Kub, a methodology that enables elastic execution of HPC workloads on Kubernetes so that the resources allocated to a job can be dynamically scaled during the execution, is presented.

Abstract

The conventional model of resource allocation in HPC systems is static. Thus, a job cannot leverage newly available resources in the system or release underutilized resources during the execution. In this paper, we present Kub, a methodology that enables elastic execution of HPC workloads on Kubernetes so that the resources allocated to a job can be dynamically scaled during the execution. One main optimization of our method is to maximize the reuse of the originally allocated resources so that the disruption to the running job can be minimized. The scaling procedure is coordinated among nodes through remote procedure calls on Kubernetes for deploying workloads in the cloud. We evaluate our approach using one synthetic benchmark and two production-level MPI-based HPC applications -- GROMACS and CM1. Our results demonstrate that the benefits of adapting the allocated resources depend on the workload characteristics. In the tested cases, a properly chosen scaling point for increasing resources during execution achieved up to 2x speedup. Also, the overhead of checkpointing and data reshuffling significantly influences the selection of optimal scaling points and requires application-specific knowledge.

Paper Structure

This paper contains 27 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The overall structure of a Volcano deployment in Kubernetes. I) Defined through a YAML file, the job is communicated to the api-server (or kube-api). II) This call is intercepted by Volcano's admission controller that checks whether the YAML has all required fields and contains no error. III) If everything is correct, data is sent to the scheduler which verifies whether it is possible to allocate the job. IV) When the resources are available, the kubelets inside each node are ordered to create the pod allocation with the desired containers for the job. A pod can contain one or more containers, and nodes can also run more than one pod. V) The "master" pod awaits all "worker" pods to be active and open their ssh daemon. When they are ready, orted starts the MPI job among them. VI) All the pods are encapsulated by a "service" type, so they can communicate using each other by domain. Volcano's scheduler and controller, responsible for monitoring the jobs, effectively replace the ones included in Kubernetes by default.
  • Figure 2: The decision flowchart of Kub when executing. One of the major benefits of Kub is to be able to use non-provisioned infrastructure as any newly-created pod can decide to join the others by exchanging SSH keys with the Coordinator. Furthermore, the control loops ensure that the scaling can be performed multiple times during the application's execution time.
  • Figure 3: Calculated overhead of the applications used in this work. Label Kub means that the applications were started using the custom launcher that coordinates the scaling, although no scaling was performed, while "Volcano" is the traditional way of using MPI applications on Kubernetes. For this experiment, each application was executed using three MPI ranks, one per Kubernetes pod.
  • Figure 4: Sensitivity test of increased compute intensity and benefit from scaling up from 2 to 6 ranks at three scaling points $30\%$, $50\%$, $70\%$, respectively.
  • Figure 5: The results for the elastic scaling performed by this work. Each case was executed 3 times, totalling 36 experiments per application. The label on the X axis refers to the amount of resources that is being introduced into the application, while the colours for each bar refer to when the scaling was performed. Refer to Sections \ref{['sec:hscale']} and \ref{['sec:disc']} for an extensive discussion about this figure.