Table of Contents
Fetching ...

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Maya Taylor, Kavitha Chandrasekar, Laxmikant V. Kale

Abstract

Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

Abstract

Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.
Paper Structure (18 sections, 6 figures, 2 tables)

This paper contains 18 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Load visualizations of a synthetic 2D stencil application. Circles denote migratable objects, and colors denote the owning processor. These figures were generated using our load balancing simulation infrastructure for diffusion (left) and greedy-refine strategies (right).
  • Figure 2: Object migration in a 2D stencil benchmark using 16 processors and an initial tiled decomposition. Imbalance is introduced synthetically so each object's load is randomly increased or decreased by 40%. Both strategies use 4 neighbors.
  • Figure 3: Evolution of particle distribution over processors over time in the PIC PRK benchmark, using $k=2$ and $\rho=.9$.
  • Figure 4: Ratio of max to average number of particles per processor over time in the PIC PRK benchmark, using $k=2$ and $\rho=.9$ and 4 processors. Load balancing is performed every 10 iterations and Diffusion strategies use 4 neighbors
  • Figure 5: Strong scaling of the PIC PRK benchmark on Perlmutter, comparing communication-based Diffusion and GreedyRefine load balancing strategies, using 10 million particles on a 6000x6000 grid with parameters $k=4$ and $\rho=.9$ and scaling the number of chares with the number of nodes.
  • ...and 1 more figures