A Communication- and Memory-Aware Model for Load Balancing Tasks

Jonathan Lifflander; Philippe P. Pebay; Nicole L. Slattengren; Pierre L. Pebay; Robert A. Pfeiffer; Joseph D. Kotulski; Sean T. McGovern

A Communication- and Memory-Aware Model for Load Balancing Tasks

Jonathan Lifflander, Philippe P. Pebay, Nicole L. Slattengren, Pierre L. Pebay, Robert A. Pfeiffer, Joseph D. Kotulski, Sean T. McGovern

TL;DR

The paper tackles load balancing in distributed-memory systems under strict memory constraints by introducing CCM, a reduced-order model that jointly accounts for computation, communication, and memory. It proposes CCM-LB, a fully distributed heuristic load balancer, and validates its near-optimality via MILP reductions (COMCP and FWMP). The Gemma electromagnetics code serves as a practical testbed, achieving up to 2.3x speedups and demonstrating scalability across scales, aided by a neural time predictor trained on diverse configurations. This work offers a principled, scalable pathway to performance-portable load balancing for irregular workloads with memory considerations, with broad potential impact on exascale, task-based, memory-bound applications.

Abstract

While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense of data replication, which increases memory usage. We propose a fully distributed, heuristic-based load balancing optimization algorithm, and demonstrate that it quickly finds close-to-optimal solutions. We formalize the complex optimization problem as a mixed-integer linear program, and compare it to our strategy. Finally, we show that when applied to an electromagnetics code, our approach obtains up to 2.3x speedups for the imbalanced execution.

A Communication- and Memory-Aware Model for Load Balancing Tasks

TL;DR

Abstract

Paper Structure (34 sections, 10 theorems, 37 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 10 theorems, 37 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related Work
Background & Challenges
Definitions & Models
Parallel Model
Nodes & Ranks
Phases
Tasks
Shared memory blocks
Compute Model
Communication Model
Memory Model
CCM Model
Distributed & Constrained Load Balancing
Augmented Inform Stage
...and 19 more sections

Key Result

theorem 3.1

Figures (5)

Figure 1: The CCM-LB algorithm.
Figure 2: A Compute-Only Memory-Constrained Problem (COMCP) example for $I$$=$$2$, $K$$=$$3$, and $N$$=$$2$, with corresponding assignment sets and matrices.
Figure 3: A FWMP example for $I$$=$$2$, $K$$=$$3$, $M$$=$$4$, and $N$$=$$2$, with corresponding communication assignment sets and tensors.
Figure 4: Results comparing the Gurobi (MILP) solutions to CCM-LB.
Figure 5: Speedup of the assembly at each scale.

Theorems & Definitions (20)

theorem 3.1: Homing communications update formulæ
proof
theorem 5.1: Boolean shared block matrix relations
proof
theorem 5.2: Integer shared block matrix relations
proof
theorem 5.3: Boolean communication tensor relations
proof
theorem 5.4: Integer communication tensor relations
proof
...and 10 more

A Communication- and Memory-Aware Model for Load Balancing Tasks

TL;DR

Abstract

A Communication- and Memory-Aware Model for Load Balancing Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)