Table of Contents
Fetching ...

Hydra: Brokering Cloud and HPC Resources to Support the Execution of Heterogeneous Workloads at Scale

Aymen Alsaadi, Shantenu Jha, Matteo Turilli

TL;DR

Hydra addresses the challenge of executing heterogeneous workloads across cloud and HPC platforms by providing a general-purpose brokering layer that can concurrently provision resources on commercial clouds, NSF-sponsored clouds, and HPC systems. It uses a connector-based Python architecture with a Provider Proxy and a Service Proxy, including CaaS, HPC, and Data managers, to map and execute tasks as executables or containers while avoiding full workflow management. The paper contributes (1) a design for broker with heterogeneity, (2) a reference Hydra implementation, (3) an experimental characterization of overheads and scaling, and (4) an end-to-end demonstration on the FACTS sea-level workflow, showing cross-platform scalability. The results show Hydra incurs minimal overhead relative to platform costs and achieves strong/weak scaling, enabling large-scale, cross-platform scientific workflows with flexible resource choices.

Abstract

Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms One of the main challenges is to design software components that integrate within the existing ecosystem to enable scale and performance across cloud and high-performance computing HPC platforms Researchers are met with a varied computing landscape which includes services available on commercial cloud platforms data and network capabilities specifically designed for scientific discovery on government-sponsored cloud platforms and scale and performance on HPC platforms We present Hydra an intra cross-cloud HPC brokering system capable of concurrently acquiring resources from commercial private cloud and HPC platforms and managing the execution of heterogeneous workflow applications on those resources This paper offers four main contributions (1) the design of brokering capabilities in the presence of task platform resource and middleware heterogeneity; (2) a reference implementation of that design with Hydra; (3) an experimental characterization of Hydra s overheads and strong weak scaling with heterogeneous workloads and platforms and, (4) the implementation of a workflow that models sea rise with Hydra and its scaling on cloud and HPC platforms

Hydra: Brokering Cloud and HPC Resources to Support the Execution of Heterogeneous Workloads at Scale

TL;DR

Hydra addresses the challenge of executing heterogeneous workloads across cloud and HPC platforms by providing a general-purpose brokering layer that can concurrently provision resources on commercial clouds, NSF-sponsored clouds, and HPC systems. It uses a connector-based Python architecture with a Provider Proxy and a Service Proxy, including CaaS, HPC, and Data managers, to map and execute tasks as executables or containers while avoiding full workflow management. The paper contributes (1) a design for broker with heterogeneity, (2) a reference Hydra implementation, (3) an experimental characterization of overheads and scaling, and (4) an end-to-end demonstration on the FACTS sea-level workflow, showing cross-platform scalability. The results show Hydra incurs minimal overhead relative to platform costs and achieves strong/weak scaling, enabling large-scale, cross-platform scientific workflows with flexible resource choices.

Abstract

Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms One of the main challenges is to design software components that integrate within the existing ecosystem to enable scale and performance across cloud and high-performance computing HPC platforms Researchers are met with a varied computing landscape which includes services available on commercial cloud platforms data and network capabilities specifically designed for scientific discovery on government-sponsored cloud platforms and scale and performance on HPC platforms We present Hydra an intra cross-cloud HPC brokering system capable of concurrently acquiring resources from commercial private cloud and HPC platforms and managing the execution of heterogeneous workflow applications on those resources This paper offers four main contributions (1) the design of brokering capabilities in the presence of task platform resource and middleware heterogeneity; (2) a reference implementation of that design with Hydra; (3) an experimental characterization of Hydra s overheads and strong weak scaling with heterogeneous workloads and platforms and, (4) the implementation of a workflow that models sea rise with Hydra and its scaling on cloud and HPC platforms
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Hydra Architecture.
  • Figure 2: Weak and strong scaling of Hydra's OVH (top) and TH (middle) and cloud provider TPT (bottom). Measured on Jetstream2, Chameleon, Azure, and AWS with MCPP (a, b, c) and SCPP (d, e, f). Weak scaling: 4K/4, 8K/8, 16K/16 tasks/vCPUs; strong scaling: 4K/[4,8,16] 8K/[4,8,16], 16K/[4,8,16] tasks/vCPUs.
  • Figure 3: Aggregated TPT, OVH and TH on CHI, JET2, AWS, and Azure with MCPP (top) and SCPP (bottom).
  • Figure 4: Aggregated TPT, OVH, and TH on four cloud providers (CHI, JET2, AWS, Azure) and ACCESS Bridges2 HPC platform with homogeneous (top) and heterogeneous (bottom) workloads and resources.
  • Figure 5: FACTS strong (right) and weak (left) scaling on Jetstream2 (blue), AWS (green) and Bridges2 (orange).