Table of Contents
Fetching ...

EPIC: An Energy-Efficient, High-Performance GPGPU Computing Research Infrastructure

Magnus Själander, Magnus Jahre, Gunnar Tufte, Nico Reissmann

TL;DR

EPIC addresses the need for energy-efficient, high-performance computation for data-parallel workloads by integrating a large-scale GPGPU infrastructure within NTNU's Idun cluster. The project organizes 158 GPGPUs across five configurations (EPIC1–EPIC5) and leverages InfiniBand networking and Lustre storage to provide a unified, scalable resource. This infrastructure has enabled diverse research in energy-efficient resource management, nanomagnetic modeling, and 3D object identification, and supports numerous PhD and MSc theses. Collectively, EPIC strengthens NTNU's HPC/AI capabilities, reduces time-to-solution for large-scale experiments, and enhances global competitiveness in computational research.

Abstract

The pursuit of many research questions requires massive computational resources. State-of-the-art research in physical processes using simulations, the training of neural networks for deep learning, or the analysis of big data are all dependent on the availability of sufficient and performant computational resources. For such research, access to a high-performance computing infrastructure is indispensable. Many scientific workloads from such research domains are inherently parallel and can benefit from the data-parallel architecture of general purpose graphics processing units (GPGPUs). However, GPGPU resources are scarce at Norway's national infrastructure. EPIC is a GPGPU enabled computing research infrastructure at NTNU. It enables NTNU's researchers to perform experiments that otherwise would be impossible, as time-to-solution would simply take too long.

EPIC: An Energy-Efficient, High-Performance GPGPU Computing Research Infrastructure

TL;DR

EPIC addresses the need for energy-efficient, high-performance computation for data-parallel workloads by integrating a large-scale GPGPU infrastructure within NTNU's Idun cluster. The project organizes 158 GPGPUs across five configurations (EPIC1–EPIC5) and leverages InfiniBand networking and Lustre storage to provide a unified, scalable resource. This infrastructure has enabled diverse research in energy-efficient resource management, nanomagnetic modeling, and 3D object identification, and supports numerous PhD and MSc theses. Collectively, EPIC strengthens NTNU's HPC/AI capabilities, reduces time-to-solution for large-scale experiments, and enhances global competitiveness in computational research.

Abstract

The pursuit of many research questions requires massive computational resources. State-of-the-art research in physical processes using simulations, the training of neural networks for deep learning, or the analysis of big data are all dependent on the availability of sufficient and performant computational resources. For such research, access to a high-performance computing infrastructure is indispensable. Many scientific workloads from such research domains are inherently parallel and can benefit from the data-parallel architecture of general purpose graphics processing units (GPGPUs). However, GPGPU resources are scarce at Norway's national infrastructure. EPIC is a GPGPU enabled computing research infrastructure at NTNU. It enables NTNU's researchers to perform experiments that otherwise would be impossible, as time-to-solution would simply take too long.

Paper Structure

This paper contains 7 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: The topology of the Idun with the EPIC research infrastructure.