Table of Contents
Fetching ...

Towards a Scalable and Efficient PGAS-based Distributed OpenMP

Baodi Shan, Mauricio Araya-Polo, Barbara Chapman

TL;DR

DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.

Abstract

MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes efficient one-sided communication and more intuitive communication primitives. In this paper, we present a novel approach that integrates PGAS concepts into the OpenMP programming model, leveraging the LLVM compiler infrastructure and the GASNet-EX communication library. Our model addresses the complexity associated with traditional MPI+OpenMP programming models while ensuring excellent performance and scalability. We evaluate our approach using a set of micro-benchmarks and application kernels on two distinct platforms: Ookami from Stony Brook University and NERSC Perlmutter. The results demonstrate that DiOMP achieves superior bandwidth and lower latency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on latency. DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.

Towards a Scalable and Efficient PGAS-based Distributed OpenMP

TL;DR

DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.

Abstract

MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes efficient one-sided communication and more intuitive communication primitives. In this paper, we present a novel approach that integrates PGAS concepts into the OpenMP programming model, leveraging the LLVM compiler infrastructure and the GASNet-EX communication library. Our model addresses the complexity associated with traditional MPI+OpenMP programming models while ensuring excellent performance and scalability. We evaluate our approach using a set of micro-benchmarks and application kernels on two distinct platforms: Ookami from Stony Brook University and NERSC Perlmutter. The results demonstrate that DiOMP achieves superior bandwidth and lower latency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on latency. DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.
Paper Structure (15 sections, 8 figures, 2 tables)

This paper contains 15 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Memory management model of the DiOMP. Each node has its own private space (green) and a shared global space (striped), with the global space further divided into aligned space (orange) and unaligned space (red). The white parts represent unused (unallocated) memory space.
  • Figure 2: The workflow of the ompx_lock() and ompx_unlock() based on Active Messages in the presence of contention.
  • Figure 3: Micro-benchmark for bandwidth on Ookami
  • Figure 4: Micro-benchmark for bandwidth on Perlmutter. Notice that for messages of size $10^6$, PGAS+OpenMP outperforms MPI+OpenMP by $25\%$.
  • Figure 5: Micro-benchmark for latency on Ookami. Notice that PGAS+OpenMP latency across message sizes is in average $45\%$ lower then MPI+OpenMP.
  • ...and 3 more figures