Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Antony Chazapis; Evangelos Maliaroudakis; Fotis Nikolaidis; Manolis Marazakis; Angelos Bilas

Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Antony Chazapis, Evangelos Maliaroudakis, Fotis Nikolaidis, Manolis Marazakis, Angelos Bilas

TL;DR

This paper explores a more practical design that enables running unmodified Cloud-native workloads directly on the main HPC cluster, avoiding resource partitioning and retaining the HPC center's existing job management and accounting policies.

Abstract

The escalating complexity of applications and services encourages a shift towards higher-level data processing pipelines that integrate both Cloud-native and HPC steps into the same workflow. Cloud providers and HPC centers typically provide both execution platforms on separate resources. In this paper we explore a more practical design that enables running unmodified Cloud-native workloads directly on the main HPC cluster, avoiding resource partitioning and retaining the HPC center's existing job management and accounting policies.

Running Cloud-native Workloads on HPC with High-Performance Kubernetes

TL;DR

Abstract

Paper Structure (8 sections, 3 figures)

This paper contains 8 sections, 3 figures.

Introduction
Related work
Design & implementation
Evaluation
Spark TPC-DS
Argo Workflows
Distributed ML Training
Conclusion

Figures (3)

Figure 1: Components involved in a typical Kubernetes deployment on bare-metal.
Figure 2: HPK translates Kubernetes workloads to Slurm and Singularity/Apptainer.
Figure 3: HPK architecture.

Running Cloud-native Workloads on HPC with High-Performance Kubernetes

TL;DR

Abstract

Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Authors

TL;DR

Abstract

Table of Contents

Figures (3)