Porting HPC Applications to AMD Instinct$^\text{TM}$ MI300A Using Unified Memory and OpenMP
Suyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya
TL;DR
The paper addresses the challenge of porting high-performance computing (HPC) applications to AMD Instinct MI300A by leveraging its unified memory architecture and the OpenMP target offloading model. It presents a practical OpenMP-based programming blueprint that unifies host and device data environments, enabling directive-based offloading with minimal code changes, demonstrated through porting OpenFOAM. Key contributions include a detailed workflow for using unified_shared_memory on MI300A, a case study showing porting of OpenFOAM with roughly O(100) code modifications, and empirical results showing MI300A delivering substantial speedups over discrete GPUs due to eliminated page migrations and cohesive memory. The work has practical significance for production HPC, offering a scalable, maintainable path to accelerate large codes on APUs with reduced data duplication and programming complexity.
Abstract
AMD Instinct$^\text{TM}$ MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC$^\text{TM}$ cores and third generation CDNA$^\text{TM}$ compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and allows an incremental acceleration of applications, iii) is easy to maintain, and iv) its potential can be well realized via the abstractions in the OpenMP 5.2 standard, where the host and the device data environments can be unified in a more performant way. In this article, we provide a blueprint of the APU programming model leveraging unified memory and highlight key distinctions compared to the conventional approach with discrete GPUs. OpenFOAM, an open-source C++ library for computational fluid dynamics, is presented as a case study to emphasize the flexibility and ease of offloading a full-scale production-ready application on MI300 APUs using directive-based OpenMP programming.
