Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Maya Taylor; Kavitha Chandrasekar; Laxmikant V. Kale

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Maya Taylor, Kavitha Chandrasekar, Laxmikant V. Kale

Abstract

Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Abstract

Paper Structure (18 sections, 6 figures, 2 tables)

This paper contains 18 sections, 6 figures, 2 tables.

Introduction
Related Work and Problem Definition
Three-stage Communication-Aware Diffusion
Neighbor Selection
Virtual Load Balancing
Object Selection
Heirarchical Load Balancing
Coordinate Variant
Evaluation in Simulation
Coordinate vs Communication-Based Diffusion
Neighbor Count Selection
Comparison with Other Strategies in Simulation
Performance
PIC PRK Load Imbalance Patterns
Particle Count Under Load Balancing
...and 3 more sections

Figures (6)

Figure 1: Load visualizations of a synthetic 2D stencil application. Circles denote migratable objects, and colors denote the owning processor. These figures were generated using our load balancing simulation infrastructure for diffusion (left) and greedy-refine strategies (right).
Figure 2: Object migration in a 2D stencil benchmark using 16 processors and an initial tiled decomposition. Imbalance is introduced synthetically so each object's load is randomly increased or decreased by 40%. Both strategies use 4 neighbors.
Figure 3: Evolution of particle distribution over processors over time in the PIC PRK benchmark, using $k=2$ and $\rho=.9$.
Figure 4: Ratio of max to average number of particles per processor over time in the PIC PRK benchmark, using $k=2$ and $\rho=.9$ and 4 processors. Load balancing is performed every 10 iterations and Diffusion strategies use 4 neighbors
Figure 5: Strong scaling of the PIC PRK benchmark on Perlmutter, comparing communication-based Diffusion and GreedyRefine load balancing strategies, using 10 million particles on a 6000x6000 grid with parameters $k=4$ and $\rho=.9$ and scaling the number of chares with the number of nodes.
...and 1 more figures

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Abstract

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Authors

Abstract

Table of Contents

Figures (6)