Table of Contents
Fetching ...

Simplifying HPC resource selection: A tool for optimizing execution time and cost on Azure

Marco A. S. Netto, Wolfgang De Savador, Davide Vanzo

TL;DR

The paper tackles the challenge of selecting cost-effective HPC cloud resources on Azure to minimize execution time and cost. It introduces HPCAdvisor, an open-source tool that automates cloud provisioning, benchmarking, output management, plotting, and generating Pareto-front recommendations using time and cost as objectives. The approach relies on data-analytic prediction to reduce the number of cloud runs, with two modes: extrapolating times across VM types for the same application input (via a BFGS-optimized scaling factor) and scaling times by application input for the same VM type. Initial experiments on OpenFOAM and Lennard-Jones benchmarks across HC, HBv2, and HBv3, up to 16 VMs (1920 cores) demonstrate substantial scenario reduction and the importance of key application parameters on resource needs. The work suggests HPCAdvisor can speed up early tuning for new hardware and support users with limited IT expertise in achieving favorable cost-performance configurations.

Abstract

Azure Cloud offers a wide range of resources for running HPC workloads, requiring users to configure their deployment by selecting VM types, number of VMs, and processes per VM. Suboptimal decisions may lead to longer execution times or additional costs for the user. We are developing an open-source tool to assist users in making these decisions by considering application input parameters, as they influence resource consumption. The tool automates the time-consuming process of setting up the cloud environment, executing the benchmarking runs, handling output, and providing users with resource selection recommendations as high level insights on run times and costs across different VM types and number of VMs. In this work, we present initial results and insights on reducing the number of cloud executions needed to provide such guidance, leveraging data analytics and optimization techniques with two well-known HPC applications: OpenFOAM and LAMMPS.

Simplifying HPC resource selection: A tool for optimizing execution time and cost on Azure

TL;DR

The paper tackles the challenge of selecting cost-effective HPC cloud resources on Azure to minimize execution time and cost. It introduces HPCAdvisor, an open-source tool that automates cloud provisioning, benchmarking, output management, plotting, and generating Pareto-front recommendations using time and cost as objectives. The approach relies on data-analytic prediction to reduce the number of cloud runs, with two modes: extrapolating times across VM types for the same application input (via a BFGS-optimized scaling factor) and scaling times by application input for the same VM type. Initial experiments on OpenFOAM and Lennard-Jones benchmarks across HC, HBv2, and HBv3, up to 16 VMs (1920 cores) demonstrate substantial scenario reduction and the importance of key application parameters on resource needs. The work suggests HPCAdvisor can speed up early tuning for new hardware and support users with limited IT expertise in achieving favorable cost-performance configurations.

Abstract

Azure Cloud offers a wide range of resources for running HPC workloads, requiring users to configure their deployment by selecting VM types, number of VMs, and processes per VM. Suboptimal decisions may lead to longer execution times or additional costs for the user. We are developing an open-source tool to assist users in making these decisions by considering application input parameters, as they influence resource consumption. The tool automates the time-consuming process of setting up the cloud environment, executing the benchmarking runs, handling output, and providing users with resource selection recommendations as high level insights on run times and costs across different VM types and number of VMs. In this work, we present initial results and insights on reducing the number of cloud executions needed to provide such guidance, leveraging data analytics and optimization techniques with two well-known HPC applications: OpenFOAM and LAMMPS.

Paper Structure

This paper contains 4 sections, 4 figures.

Figures (4)

  • Figure 1: Predict the execution time curve using data from a different VM type and same application input---example for OpenFOAM.
  • Figure 2: Predict the execution time for the same VM type but with different application input parameter---example for OpenFOAM.
  • Figure 3: Predict the execution time curve using data from a different VM type and same application input---example for LAMMPS.
  • Figure 4: Predict the execution time for the same VM type but with different application input parameter---example for LAMMPS.