Table of Contents
Fetching ...

Energy-aware operation of HPC systems in Germany

Estela Suarez, Hendryk Bockelmann, Norbert Eicker, Jan Eitzinger, Salem El Sayed, Thomas Fieseler, Martin Frank, Peter Frech, Pay Giesselmann, Daniel Hackenberg, Georg Hager, Andreas Herten, Thomas Ilsche, Bastian Koller, Erwin Laure, Cristina Manzano, Sebastian Oeste, Michael Ott, Klaus Reuter, Ralf Schneider, Kay Thust, Benedikt von St. Vieth

TL;DR

Various state-of-the-art strategies and innovations employed to enhance the energy efficiency of HPC systems within the national context are described, motivating the pursuit of more sustainable and energy-efficient HPC architectures and operations.

Abstract

High-Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 megawatts per installation. Unlike other major scientific infrastructures such as particle accelerators or high-intensity light sources, which are few around the world, the number and size of supercomputers are continuously increasing. Even if every new system generation is more energy efficient than the previous one, the overall growth in size of the HPC infrastructure, driven by a rising demand for computational capacity across all scientific disciplines, and especially by artificial intelligence workloads (AI), rapidly drives up the energy demand. This challenge is particularly significant for HPC centers in Germany, where high electricity costs, stringent national energy policies, and a strong commitment to environmental sustainability are key factors. This paper describes various state-of-the-art strategies and innovations employed to enhance the energy efficiency of HPC systems within the national context. Case studies from leading German HPC facilities illustrate the implementation of novel heterogeneous hardware architectures, advanced monitoring infrastructures, high-temperature cooling solutions, energy-aware scheduling, and dynamic power management, among other optimizations. By reviewing best practices and ongoing research, this paper aims to share valuable insight with the global HPC community, motivating the pursuit of more sustainable and energy-efficient HPC operations.

Energy-aware operation of HPC systems in Germany

TL;DR

Various state-of-the-art strategies and innovations employed to enhance the energy efficiency of HPC systems within the national context are described, motivating the pursuit of more sustainable and energy-efficient HPC architectures and operations.

Abstract

High-Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 megawatts per installation. Unlike other major scientific infrastructures such as particle accelerators or high-intensity light sources, which are few around the world, the number and size of supercomputers are continuously increasing. Even if every new system generation is more energy efficient than the previous one, the overall growth in size of the HPC infrastructure, driven by a rising demand for computational capacity across all scientific disciplines, and especially by artificial intelligence workloads (AI), rapidly drives up the energy demand. This challenge is particularly significant for HPC centers in Germany, where high electricity costs, stringent national energy policies, and a strong commitment to environmental sustainability are key factors. This paper describes various state-of-the-art strategies and innovations employed to enhance the energy efficiency of HPC systems within the national context. Case studies from leading German HPC facilities illustrate the implementation of novel heterogeneous hardware architectures, advanced monitoring infrastructures, high-temperature cooling solutions, energy-aware scheduling, and dynamic power management, among other optimizations. By reviewing best practices and ongoing research, this paper aims to share valuable insight with the global HPC community, motivating the pursuit of more sustainable and energy-efficient HPC operations.

Paper Structure

This paper contains 19 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Energy efficiency for FP64 floating point throughput of a selection of CPUs (left) and GPUs (right). Determined with theoretical peak performance and TDP of one socket/GPU using the highest SKU of each generation. For CPUs, the frequency used to determine peak performance is the lowest frequency measured with a very hot benchmark. For GPUs, the base frequency is taken, assuming continued computations. For GPUs results with and without considering tensor cores are shown. The graphs compare similar, albeit not identical frequency types (measured vs. computed); cross-graph comparability is only limited.
  • Figure 2: Average electricity price for new industrial consumers in Germany. Annual consumption 160 000 to 20 million kWh, medium-voltage supply. Data source BDEW.
  • Figure 3: Typical components of a monitoring setup in HPC-Cluster environments