A Communication- and Memory-Aware Model for Load Balancing Tasks
Jonathan Lifflander, Philippe P. Pebay, Nicole L. Slattengren, Pierre L. Pebay, Robert A. Pfeiffer, Joseph D. Kotulski, Sean T. McGovern
TL;DR
The paper tackles load balancing in distributed-memory systems under strict memory constraints by introducing CCM, a reduced-order model that jointly accounts for computation, communication, and memory. It proposes CCM-LB, a fully distributed heuristic load balancer, and validates its near-optimality via MILP reductions (COMCP and FWMP). The Gemma electromagnetics code serves as a practical testbed, achieving up to 2.3x speedups and demonstrating scalability across scales, aided by a neural time predictor trained on diverse configurations. This work offers a principled, scalable pathway to performance-portable load balancing for irregular workloads with memory considerations, with broad potential impact on exascale, task-based, memory-bound applications.
Abstract
While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense of data replication, which increases memory usage. We propose a fully distributed, heuristic-based load balancing optimization algorithm, and demonstrate that it quickly finds close-to-optimal solutions. We formalize the complex optimization problem as a mixed-integer linear program, and compare it to our strategy. Finally, we show that when applied to an electromagnetics code, our approach obtains up to 2.3x speedups for the imbalanced execution.
