Solutions for Distributed Memory Access Mechanism on HPC Clusters
Jan Meizner, Maciej Malawski
TL;DR
The paper investigates remote memory access mechanisms for multi-node HPC clusters, motivated by genomics workflows such as STAR. It compares a LD_PRELOAD–VFS approach over Lustre with an MPI-based RDMA memory access method. Experiments show MPI-based remote memory access can approach local memory performance, while the VFS approach degrades under heavier workloads. The work highlights MPI-based memory sharing as a promising path for multi-tenant HPC environments and proposes future FUSE-based MPI memory sharing to support closed-source software.
Abstract
Paper presents and evaluates various mechanisms for remote access to memory in distributed systems based on two distinct HPC clusters. We are comparing solutions based on the shared storage and MPI (over Infiniband and Slingshot) to the local memory access. This paper also mentions medical use-cases that would mostly benefit from the described solution. We have found out that results for remote access esp. backed by MPI are similar to local memory access.
