Table of Contents
Fetching ...

Parallel Spawning Strategies for Dynamic-Aware MPI Applications

Iker Martín-Álvarez, José I. Aliaga, Maribel Castillo, Sergio Iserte

TL;DR

The paper tackles the challenge of dynamic resource management in MPI by introducing parallel spawning strategies to enable malleability with minimal reconfiguration overhead. It develops two algorithms, Hypercube for homogeneous systems and Iterative Diffusive for heterogeneous/shared environments, integrated into the MaM framework to support scalable expansion and fast shrink operations. Experimental results on both homogeneous and heterogeneous clusters show expansion overhead remains within $1.25\times$, while shrink operations can be accelerated by at least $20\times$, with Merge often outperforming alternative approaches. These contributions advance practical DRM for large-scale HPC, including shared-resource contexts, and point to future work on reducing synchronization and data-redistribution costs.

Abstract

Dynamic resource management is an increasingly important capability of High Performance Computing systems, as it enables jobs to adjust their resource allocation at runtime. This capability has been shown to reduce workload makespan, substantially decrease job waiting times and improve overall system utilization. In this context, malleability refers to the ability of applications to adapt to new resource allocations during execution. Although beneficial, malleability incurs significant reconfiguration costs, making the reduction of these costs an important research topic. Some existing methods for MPI applications respawn the entire application, which is an expensive solution that avoids the reuse of original processes. Other MPI methods reuse them, but fail to fully release unneeded processes when shrinking, since some ranks within the same communicator remain active across nodes, preventing the application from returning those nodes to the system. This work overcomes both limitations by proposing a novel parallel spawning strategy, in which all processes cooperate in spawning before redistribution, thereby reducing execution time. Additionally, it removes shrinkage limitations, allowing better adaptation of parallel systems to workload and reducing their makespan. As a result, it preserves competitive expansion times with at most a $1.25\times$ overhead, while enabling fast shrink operations that reduce their cost by at least $20\times$. This strategy has been validated on both homogeneous and heterogeneous systems and can also be applied in shared-resource environments.

Parallel Spawning Strategies for Dynamic-Aware MPI Applications

TL;DR

The paper tackles the challenge of dynamic resource management in MPI by introducing parallel spawning strategies to enable malleability with minimal reconfiguration overhead. It develops two algorithms, Hypercube for homogeneous systems and Iterative Diffusive for heterogeneous/shared environments, integrated into the MaM framework to support scalable expansion and fast shrink operations. Experimental results on both homogeneous and heterogeneous clusters show expansion overhead remains within , while shrink operations can be accelerated by at least , with Merge often outperforming alternative approaches. These contributions advance practical DRM for large-scale HPC, including shared-resource contexts, and point to future work on reducing synchronization and data-redistribution costs.

Abstract

Dynamic resource management is an increasingly important capability of High Performance Computing systems, as it enables jobs to adjust their resource allocation at runtime. This capability has been shown to reduce workload makespan, substantially decrease job waiting times and improve overall system utilization. In this context, malleability refers to the ability of applications to adapt to new resource allocations during execution. Although beneficial, malleability incurs significant reconfiguration costs, making the reduction of these costs an important research topic. Some existing methods for MPI applications respawn the entire application, which is an expensive solution that avoids the reuse of original processes. Other MPI methods reuse them, but fail to fully release unneeded processes when shrinking, since some ranks within the same communicator remain active across nodes, preventing the application from returning those nodes to the system. This work overcomes both limitations by proposing a novel parallel spawning strategy, in which all processes cooperate in spawning before redistribution, thereby reducing execution time. Additionally, it removes shrinkage limitations, allowing better adaptation of parallel systems to workload and reducing their makespan. As a result, it preserves competitive expansion times with at most a overhead, while enabling fast shrink operations that reduce their cost by at least . This strategy has been validated on both homogeneous and heterogeneous systems and can also be applied in shared-resource environments.

Paper Structure

This paper contains 14 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Generation of 7 groups using parallel spawning. Each step is represented with different markers.
  • Figure 2: Synchronization of 6 spawned groups and the initial node. The arrows indicate the direction of messages, and stripes the need of that group of using a Barrier.
  • Figure 3: Connection of 7 spawned groups in 3 steps. In each step the amount of groups is halved and connections operations may be unordered.
  • Figure 4: Resize times in homogenous system.
  • Figure 5: Median reconfiguration times in MNV.
  • ...and 1 more figures