Table of Contents
Fetching ...

Green computing toward SKA era with RICK

Giovanni Lacopo, Claudio Gheller, Emanuele De Rubeis, Pascal Jahan Elahi, Maciej Cytowski, Luca Tornatore, Giuliano Taffoni, Ugo Varetto

TL;DR

The paper tackles SKA-scale data processing by introducing RICK, a w-stacking imaging code designed for energy-aware HPC on heterogeneous architectures. It evaluates three parallelization strategies—MPI, hybrid MPI/OpenMP, and GPU acceleration—and introduces green productivity, $GP = \frac{T_0/T_N}{\alpha E_N/E_0}$ with $\alpha=1$, to jointly assess time-to-solution and energy-to-solution. Through LOFAR data tests on Setonix hardware, GPU-accelerated configurations deliver substantial improvements in both speed and energy efficiency at scale, while single-node scenarios may favor hybrid approaches due to I/O and data-transfer considerations. The work highlights the importance of energy-aware, GPU-accelerated imaging for SKA-era pipelines and points to distributed FFTs for AMD GPUs as a path to further performance and energy gains. Overall, RICK demonstrates how careful hardware-software co-design can achieve sustainable, high-throughput radio-imaging workloads on next-generation HPC systems.

Abstract

Square Kilometer Array is expected to generate hundreds of petabytes of data per year, two orders of magnitude more than current radio interferometers. Data processing at this scale necessitates advanced High Performance Computing (HPC) resources. However, modern HPC platforms consume up to tens of M W , i.e. megawatts, and energy-to-solution in algorithms will become of utmost importance in the next future. In this work we study the trade-off between energy-to-solution and time-to-solution of our RICK code (Radio Imaging Code Kernels), which is a novel approach to implement the w-stacking algorithm designed to run on state-of-the-art HPC systems. The code can run on heterogeneous systems exploiting the accelerators. We did both single-node tests and multi-node tests with both CPU and GPU solutions, in order to study which one is the greenest and which one is the fastest. We then defined the green productivity, i.e. a quantity which relates energy-to-solution and time-to-solution in different code configurations compared to a reference one. Configurations with the highest green productivities are the most efficient ones. The tests have been run on the Setonix machine available at the Pawsey Supercomputing Research Centre (PSC) in Perth (WA), ranked as 28th in Top500 list, updated at June 2024.

Green computing toward SKA era with RICK

TL;DR

The paper tackles SKA-scale data processing by introducing RICK, a w-stacking imaging code designed for energy-aware HPC on heterogeneous architectures. It evaluates three parallelization strategies—MPI, hybrid MPI/OpenMP, and GPU acceleration—and introduces green productivity, with , to jointly assess time-to-solution and energy-to-solution. Through LOFAR data tests on Setonix hardware, GPU-accelerated configurations deliver substantial improvements in both speed and energy efficiency at scale, while single-node scenarios may favor hybrid approaches due to I/O and data-transfer considerations. The work highlights the importance of energy-aware, GPU-accelerated imaging for SKA-era pipelines and points to distributed FFTs for AMD GPUs as a path to further performance and energy gains. Overall, RICK demonstrates how careful hardware-software co-design can achieve sustainable, high-throughput radio-imaging workloads on next-generation HPC systems.

Abstract

Square Kilometer Array is expected to generate hundreds of petabytes of data per year, two orders of magnitude more than current radio interferometers. Data processing at this scale necessitates advanced High Performance Computing (HPC) resources. However, modern HPC platforms consume up to tens of M W , i.e. megawatts, and energy-to-solution in algorithms will become of utmost importance in the next future. In this work we study the trade-off between energy-to-solution and time-to-solution of our RICK code (Radio Imaging Code Kernels), which is a novel approach to implement the w-stacking algorithm designed to run on state-of-the-art HPC systems. The code can run on heterogeneous systems exploiting the accelerators. We did both single-node tests and multi-node tests with both CPU and GPU solutions, in order to study which one is the greenest and which one is the fastest. We then defined the green productivity, i.e. a quantity which relates energy-to-solution and time-to-solution in different code configurations compared to a reference one. Configurations with the highest green productivities are the most efficient ones. The tests have been run on the Setonix machine available at the Pawsey Supercomputing Research Centre (PSC) in Perth (WA), ranked as 28th in Top500 list, updated at June 2024.

Paper Structure

This paper contains 16 sections, 4 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: RICK code workflow, highlighting the 5 main algorithmic components and the level of HPC enabling (MPI, multithreading, GPU) of each component.
  • Figure 2: Green productivity referred to the pure MPI run of the different hybrid MPI+OpenMP configurations and CPU+GPU.
  • Figure 3: Fraction of runtime spent in the reduce operation as a function of the number of computing nodes, for different CPU frequencies.
  • Figure 4: Left: energy saving compared to the highest CPU frequency for default (blue), medium (orange), low (green) CPU frequencies as a function of computing nodes. Right: performance degradation of default, medium and low CPU frequencies compared to the highest CPU frequency as a function of computing nodes.
  • Figure 5: Left: ratio between the energy in the pure CPU case and the energy in the CPU+GPU case, at different CPU frequencies, as a function of computing nodes. Right: ratio between the CPU runtime and CPU+GPU runtime, at different CPU frequencies, as a function of computing nodes. In the GPU tests both CPU and GPU frequencies are set by the OS to their default values.
  • ...and 1 more figures