EPIC: An Energy-Efficient, High-Performance GPGPU Computing Research Infrastructure
Magnus Själander, Magnus Jahre, Gunnar Tufte, Nico Reissmann
TL;DR
EPIC addresses the need for energy-efficient, high-performance computation for data-parallel workloads by integrating a large-scale GPGPU infrastructure within NTNU's Idun cluster. The project organizes 158 GPGPUs across five configurations (EPIC1–EPIC5) and leverages InfiniBand networking and Lustre storage to provide a unified, scalable resource. This infrastructure has enabled diverse research in energy-efficient resource management, nanomagnetic modeling, and 3D object identification, and supports numerous PhD and MSc theses. Collectively, EPIC strengthens NTNU's HPC/AI capabilities, reduces time-to-solution for large-scale experiments, and enhances global competitiveness in computational research.
Abstract
The pursuit of many research questions requires massive computational resources. State-of-the-art research in physical processes using simulations, the training of neural networks for deep learning, or the analysis of big data are all dependent on the availability of sufficient and performant computational resources. For such research, access to a high-performance computing infrastructure is indispensable. Many scientific workloads from such research domains are inherently parallel and can benefit from the data-parallel architecture of general purpose graphics processing units (GPGPUs). However, GPGPU resources are scarce at Norway's national infrastructure. EPIC is a GPGPU enabled computing research infrastructure at NTNU. It enables NTNU's researchers to perform experiments that otherwise would be impossible, as time-to-solution would simply take too long.
