Table of Contents
Fetching ...

UNR: Unified Notifiable RMA Library for HPC

Guangnan Feng, Jiabin Xie, Dezun Dong, Yutong Lu

TL;DR

This work proposed a Unified Notifiable RMA (UNR) library for HPC to address the challenges of multi-NIC aggregation, portability, hardware-software co-design, and usability problems, and deployed UNR across four HPC systems, each with a different interconnect.

Abstract

Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the completion of RMA operations. Many previous works have proposed programming models and extensions to notify the communication peer, but they did not solve the multi-NIC aggregation, portability, hardware-software co-design, and usability problems. In this work, we proposed a Unified Notifiable RMA (UNR) library for HPC to address these challenges. In addition, we demonstrate the best practice of utilizing UNR within a real-world scientific application, PowerLLEL. We deployed UNR across four HPC systems, each with a different interconnect. The results show that PowerLLEL powered by UNR achieves up to a 36% acceleration on 1728 nodes of the Tianhe-Xingyi supercomputing system.

UNR: Unified Notifiable RMA Library for HPC

TL;DR

This work proposed a Unified Notifiable RMA (UNR) library for HPC to address the challenges of multi-NIC aggregation, portability, hardware-software co-design, and usability problems, and deployed UNR across four HPC systems, each with a different interconnect.

Abstract

Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the completion of RMA operations. Many previous works have proposed programming models and extensions to notify the communication peer, but they did not solve the multi-NIC aggregation, portability, hardware-software co-design, and usability problems. In this work, we proposed a Unified Notifiable RMA (UNR) library for HPC to address these challenges. In addition, we demonstrate the best practice of utilizing UNR within a real-world scientific application, PowerLLEL. We deployed UNR across four HPC systems, each with a different interconnect. The results show that PowerLLEL powered by UNR achieves up to a 36% acceleration on 1728 nodes of the Tianhe-Xingyi supercomputing system.
Paper Structure (26 sections, 7 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 7 figures, 3 tables, 3 algorithms.

Figures (7)

  • Figure 1: Communication Protocols and Operations
  • Figure 2: Multi-channel Multi-message Aggregated Signal. The receiver waits for messages from two senders until the counter reaches $0$. Sender1 divides the large message into four sub-messages and transfers them through four NICs.
  • Figure 3: Optimizing PowerLLEL using UNR
  • Figure 4: Latency Test.
  • Figure 5: UNR Ping-pong Tests with Calculation. Sharing NICs can improve throughput because (a) it allows some messages to be received and calculated in advance, and (b) absorbs the load imbalance in computation.
  • ...and 2 more figures