Table of Contents
Fetching ...

Energy Efficiency trends in HPC: what high-energy and astrophysicists need to know

Estela Suarez, Jorge Amaya, Martin Frank, Oliver Freyermuth, Maria Girone, Bartosz Kostrzewa, Susanne Pfalzner

TL;DR

The paper addresses rising HPC energy demands by surveying hardware, software, programming models, data management, and domain specific workloads, with targeted guidance for astrophysics, HEP, and lattice field theory communities. It employs a framework of performance and energy informed practices, including performance analysis tools, modular architectures, and portable libraries, to improve energy efficiency without sacrificing throughput. Key contributions include domain specific recommendations, emphasis on single node optimization, and the promotion of open data and reproducibility to enhance efficiency. The practical impact is to help domain scientists and developers reduce energy usage, improve scientific throughput per watt, and adapt to increasingly heterogeneous and dynamic HPC environments.

Abstract

The growing energy demands of HPC systems have made energy efficiency a critical concern for system developers and operators. However, HPC users are generally less aware of how these energy concerns influence the design, deployment, and operation of supercomputers even though they experience the consequences. This paper examines the implications of HPC's energy consumption, providing an overview of current trends aimed at improving energy efficiency. We describe how hardware innovations such as energy-efficient processors, novel system architectures, power management techniques, and advanced scheduling policies do have a direct impact on how applications need to be programmed and executed on HPC systems. For application developers, understanding how these new systems work and how to analyse and report the performances of their own software is critical in the dialog with HPC system designers and administrators. The paper aims to raise awareness about energy efficiency among users, particularly in the high energy physics and astrophysics domains, offering practical advice on how to analyse and optimise applications to reduce their energy consumption without compromising on performance.

Energy Efficiency trends in HPC: what high-energy and astrophysicists need to know

TL;DR

The paper addresses rising HPC energy demands by surveying hardware, software, programming models, data management, and domain specific workloads, with targeted guidance for astrophysics, HEP, and lattice field theory communities. It employs a framework of performance and energy informed practices, including performance analysis tools, modular architectures, and portable libraries, to improve energy efficiency without sacrificing throughput. Key contributions include domain specific recommendations, emphasis on single node optimization, and the promotion of open data and reproducibility to enhance efficiency. The practical impact is to help domain scientists and developers reduce energy usage, improve scientific throughput per watt, and adapt to increasingly heterogeneous and dynamic HPC environments.

Abstract

The growing energy demands of HPC systems have made energy efficiency a critical concern for system developers and operators. However, HPC users are generally less aware of how these energy concerns influence the design, deployment, and operation of supercomputers even though they experience the consequences. This paper examines the implications of HPC's energy consumption, providing an overview of current trends aimed at improving energy efficiency. We describe how hardware innovations such as energy-efficient processors, novel system architectures, power management techniques, and advanced scheduling policies do have a direct impact on how applications need to be programmed and executed on HPC systems. For application developers, understanding how these new systems work and how to analyse and report the performances of their own software is critical in the dialog with HPC system designers and administrators. The paper aims to raise awareness about energy efficiency among users, particularly in the high energy physics and astrophysics domains, offering practical advice on how to analyse and optimise applications to reduce their energy consumption without compromising on performance.

Paper Structure

This paper contains 22 sections, 13 figures.

Figures (13)

  • Figure 1: Statistics of allocated compute time per scientific domain in the period November 2024 to April 2025, on the JUWELS supercomputer Alvarez2021. Left shows the CPU module, aka JUWELS Cluster. Right shows the GPU module, aka JUWELS Booster. Astrophysics and HEP (the latter being to a large extent LQCD applications) account together for between 1/4 and 1/3 of the available compute time. Data Source: GCS/JSC. Data available in plots_data_zenodo.
  • Figure 2: Evolution of processor architectures over time: from single-core CPU, to many-core CPU, to heterogeneous compute nodes with CPU and GPU. The most recent trend goes towards specialised chip designs that are internally heterogeneous, combining different processing technologies in the form of chiplets.
  • Figure 3: Schematic description of the architectures of a CPU (left) and a GPU (right). CPU typically contain a limited amount of large, complex, highly capable processing cores, while GPU contain thousands of simple arithmetic execution units (increasingly for low-precision operations). CPU are therefore good for latency limited applications, while GPU serve best highly parallel workloads
  • Figure 4: High-level view of the software stack running on HPC systems. The product names given as example are merely illustrative, with no intention of giving a comprehensive list of all possible solutions.
  • Figure 5: Measured and extrapolated parallel efficiency of a kinetic space plasma code using DIMEMAS dimemas, reproduced from Figure 26.a (page 63) in deepest:2021. The parallel efficiency, representing the time spent on computation (useful work), is composed of the load balance, the serialisation, and the transfer efficiencies, as defined in section 4.1 of deepest:2021 and the DIMEMAS dimemas documentation. In this example three measuring points at 1, 24, and 96 processes, were used to extrapolate the parallel efficiency of the code up to one million processes.
  • ...and 8 more figures