Table of Contents
Fetching ...

Introducing MareNostrum5: A European pre-exascale energy-efficient system designed to serve a broad spectrum of scientific workloads

Fabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani, Joan Vinyals, Josep Pocurull, David Vicente, Beatriz Eguzkitza, Flavio C. C. Galeazzo, Mario C. Acosta, Sergi Girona

TL;DR

MareNostrum5 is a European pre-exascale system designed for energy-efficient, broad-spectrum HPC workloads. The paper presents a comprehensive evaluation across micro-benchmarks, HPC benchmarks (HPL/HPCG), and real applications (Alya, OpenFOAM, IFS), using the EAR framework to quantify power and the impact of direct liquid cooling. Key findings show near-peak CPU performance, substantial memory bandwidth gains with HBM (albeit with higher energy costs), and strong but workload-dependent scalability for CFD and weather/climate models, with notable memory- and communication-bound limitations at large scales. The results provide practical guidance for users to optimize configurations and workload placement to maximize performance and energy efficiency on MareNostrum5.

Abstract

MareNostrum5 is a pre-exascale supercomputer at the Barcelona Supercomputing Center (BSC), part of the EuroHPC Joint Undertaking. With a peak performance of 314 petaflops, MareNostrum5 features a hybrid architecture comprising Intel Sapphire Rapids CPUs, NVIDIA Hopper GPUs, and DDR5 and high-bandwidth memory (HBM), organized into four partitions optimized for diverse workloads. This document evaluates MareNostrum5 through micro-benchmarks (floating-point performance, memory bandwidth, interconnect throughput), HPC benchmarks (HPL and HPCG), and application studies using Alya, OpenFOAM, and IFS. It highlights MareNostrum5's scalability, efficiency, and energy performance, utilizing the EAR (Energy Aware Runtime) framework to assess power consumption and the effects of direct liquid cooling. Additionally, HBM and DDR5 configurations are compared to examine memory performance trade-offs. Designed to complement standard technical documentation, this study provides insights to guide both new and experienced users in optimizing their workloads and maximizing MareNostrum5's computational capabilities.

Introducing MareNostrum5: A European pre-exascale energy-efficient system designed to serve a broad spectrum of scientific workloads

TL;DR

MareNostrum5 is a European pre-exascale system designed for energy-efficient, broad-spectrum HPC workloads. The paper presents a comprehensive evaluation across micro-benchmarks, HPC benchmarks (HPL/HPCG), and real applications (Alya, OpenFOAM, IFS), using the EAR framework to quantify power and the impact of direct liquid cooling. Key findings show near-peak CPU performance, substantial memory bandwidth gains with HBM (albeit with higher energy costs), and strong but workload-dependent scalability for CFD and weather/climate models, with notable memory- and communication-bound limitations at large scales. The results provide practical guidance for users to optimize configurations and workload placement to maximize performance and energy efficiency on MareNostrum5.

Abstract

MareNostrum5 is a pre-exascale supercomputer at the Barcelona Supercomputing Center (BSC), part of the EuroHPC Joint Undertaking. With a peak performance of 314 petaflops, MareNostrum5 features a hybrid architecture comprising Intel Sapphire Rapids CPUs, NVIDIA Hopper GPUs, and DDR5 and high-bandwidth memory (HBM), organized into four partitions optimized for diverse workloads. This document evaluates MareNostrum5 through micro-benchmarks (floating-point performance, memory bandwidth, interconnect throughput), HPC benchmarks (HPL and HPCG), and application studies using Alya, OpenFOAM, and IFS. It highlights MareNostrum5's scalability, efficiency, and energy performance, utilizing the EAR (Energy Aware Runtime) framework to assess power consumption and the effects of direct liquid cooling. Additionally, HBM and DDR5 configurations are compared to examine memory performance trade-offs. Designed to complement standard technical documentation, this study provides insights to guide both new and experienced users in optimizing their workloads and maximizing MareNostrum5's computational capabilities.

Paper Structure

This paper contains 92 sections, 15 equations, 31 figures, 6 tables.

Figures (31)

  • Figure 1: High level components connectivity of a MareNostrum5 GPP tray which houses two compute nodes.
  • Figure 2: High level components connectivity of a MareNostrum5 ACC compute node.
  • Figure 3: Network diagram of MareNostrum5.
  • Figure 4: Single-node performance and power efficiency
  • Figure 5: Average performance per thread, standard deviation, and performance balance.
  • ...and 26 more figures