Table of Contents
Fetching ...

On the Challenges of Energy-Efficiency Analysis in HPC Systems: Evaluating Synthetic Benchmarks and Gromacs

Rafael Ravedutti Lucio Machado, Jan Eitzinger, Georg Hager, Gerhard Wellein

TL;DR

The paper analyzes the challenges of assessing energy efficiency in HPC by comparing synthetic benchmarks with the Gromacs molecular dynamics package on heterogeneous Fritz and Alex clusters. It leverages MD-Bench, STREAM, GEMM, and Gromacs across CPU and GPU environments to study how frequency, power caps, and affinity influence energy-to-solution and energy-delay products, while highlighting measurement overheads and profiling pitfalls. Key contributions include a detailed account of instrumentation overhead, the limitations of hardware performance counters, and best-practice recommendations for rigorous benchmarking in energy-focused HPC studies. The findings emphasize the need for end-to-end measurements, careful affinity control, and cross-validation across tools to produce reliable, generalizable insights, and call for standardized, open interfaces for power measurement across platforms.

Abstract

This paper discusses the challenges encountered when analyzing the energy efficiency of synthetic benchmarks and the Gromacs package on the Fritz and Alex HPC clusters. Experiments were conducted using MPI parallelism on full sockets of Intel Ice Lake and Sapphire Rapids CPUs, as well as Nvidia A40 and A100 GPUs. The metrics and measurements obtained with the Likwid and Nvidia profiling tools are presented, along with the results. The challenges and pitfalls encountered during experimentation and analysis are revealed and discussed. Best practices for future energy efficiency analysis studies are suggested.

On the Challenges of Energy-Efficiency Analysis in HPC Systems: Evaluating Synthetic Benchmarks and Gromacs

TL;DR

The paper analyzes the challenges of assessing energy efficiency in HPC by comparing synthetic benchmarks with the Gromacs molecular dynamics package on heterogeneous Fritz and Alex clusters. It leverages MD-Bench, STREAM, GEMM, and Gromacs across CPU and GPU environments to study how frequency, power caps, and affinity influence energy-to-solution and energy-delay products, while highlighting measurement overheads and profiling pitfalls. Key contributions include a detailed account of instrumentation overhead, the limitations of hardware performance counters, and best-practice recommendations for rigorous benchmarking in energy-focused HPC studies. The findings emphasize the need for end-to-end measurements, careful affinity control, and cross-validation across tools to produce reliable, generalizable insights, and call for standardized, open interfaces for power measurement across platforms.

Abstract

This paper discusses the challenges encountered when analyzing the energy efficiency of synthetic benchmarks and the Gromacs package on the Fritz and Alex HPC clusters. Experiments were conducted using MPI parallelism on full sockets of Intel Ice Lake and Sapphire Rapids CPUs, as well as Nvidia A40 and A100 GPUs. The metrics and measurements obtained with the Likwid and Nvidia profiling tools are presented, along with the results. The challenges and pitfalls encountered during experimentation and analysis are revealed and discussed. Best practices for future energy efficiency analysis studies are suggested.

Paper Structure

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: EDP versus measured frequencies for synthetic benchmarks on Sapphire Rapids and Gromacs (both Sapphire Rapids and Ice Lake). For Figure (a) with the synthetic benchmarks, the EDP is normalized. The marker colors represent the performance obtained on the Gromacs benchmarks in $ns/day$.
  • Figure 2: Z-plot showing the energy-to-solution versus performance when running different Gromacs benchmarks at different frequencies (measured) on both CPUs and GPUs. The marker colors on the Figure (c) represent the GPU graphics frequency setting.
  • Figure 3: EDP versus frequencies and powercap settings for synthetic benchmarks and Gromacs on A40 and A100 GPUs. For Figure (a) with the synthetic benchmarks, the EDP is normalized.
  • Figure 4: Average power draw versus measured frequencies for Gromacs on CPUs. Dotted lines show the measurements from uncore frequencies for each case. Figure (c) shows the measurements for cases in which the ranks are distributed between the two sockets.