Load Balancing Using Sparse Communication

Gal Mendelson; Xu Kuang

Load Balancing Using Sparse Communication

Gal Mendelson, Xu Kuang

TL;DR

This work introduces CARE, a modular model for load balancing under sparse communication, where a load balancer maintains a state approximation of server queues via a dedicated Approximation component and routes with JSAQ. It proposes MSR-based queue emulation and three communication patterns (RT, DT, ET) to bound maximal approximation error while using far less than full state information. The authors establish diffusion-scale results (SDDP) showing that, for bounded approximation error, the system achieves asymptotically optimal workload and near-optimal delay, linking communication sparsity to performance rigorously. Simulations demonstrate substantial communication reductions (up to ~90%) with competitive or superior performance compared with JSQ, SQ(2), and RR, guiding practical design decisions for data-center-like systems.

Abstract

Load balancing across parallel servers is an important class of congestion control problems that arises in service systems. An effective load balancer relies heavily on accurate, real-time congestion information to make routing decisions. However, obtaining such information can impose significant communication overheads, especially in demanding applications like those found in modern data centers. We introduce a framework for communication-aware load balancing and design new load balancing algorithms that perform exceptionally well even in scenarios with sparse communication patterns. Central to our approach is state approximation, where the load balancer first estimates server states through a communication protocol. Subsequently, it utilizes these approximate states within a load balancing algorithm to determine routing decisions. We demonstrate that by using a novel communication protocol, one can achieve accurate queue length approximation with sparse communication: for a maximal approximation error of x, the communication frequency only needs to be O(1/x^2). We further show, via a diffusion analysis, that a constant maximal approximation error is sufficient for achieving asymptotically optimal performance. Taken together, these results therefore demonstrate that highly performant load balancing is possible with very little communication. Through simulations, we observe that the proposed designs match or surpass the performance of state-of-the-art load balancing algorithms while drastically reducing communication rates by up to 90%.

Load Balancing Using Sparse Communication

TL;DR

Abstract

Paper Structure (60 sections, 20 theorems, 183 equations, 13 figures)

This paper contains 60 sections, 20 theorems, 183 equations, 13 figures.

Introduction
State Approximation as Intermediary
Preview of Main Contributions
Organization
Notation
The CARE Model and Summary of Contributions
The CARE Model
The Environment Component
Communication Component
Approximation Component
Resource Allocation Component
Metrics for Communication, Approximation and Performance
Communication Metric
Approximation Metric
Performance Metric
...and 45 more sections

Key Result

Theorem 2.3

For every $x \in \{2,3,4,\ldots\}$ there exists a combination of an approximation algorithm and communication pattern under which $AQ(t)\leq x-1$ and $M(t)\leq \frac{1}{x}D(t)$, for all $t\geq 0$. Specifically, this holds for DT-$x$ or ET-$x$ with the basic or MSR-x approximation algorithms.

Figures (13)

Figure 1: A load balancer routes incoming jobs to servers working in parallel.
Figure 4: The CARE model in the context of load balancing with parallel servers.
Figure 5: Comparison of the communications rates of different load balancing architectures in a system with $K$ servers. Communication rates are measured in number of messages per arrival. (D) denotes the rate under the implementation where servers notify the load balancer upon job departures, while (A) denotes the implementation where the load balancer queries servers upon each arrival. The parameter $x$ controls the quality of approximation, and can be roughly interpreted as the maximum error tolerance in the queue length approximations.
Figure 6: The communication requirement of ET-$x$ with MSR for different loads and maximal approximation error $y$ ($=x-1$), relative to the communication required for full state information. The required communication decreases quadratically with $y$, and is lower than the upper bound derived in Theorem \ref{['thm:summary 3']}.
Figure 7: The communication requirement of ET-$x$ with MSR-$x$ for different loads and maximal approximation error $y$ ($=x-1$), relative to the communication required for full state information. The required communication decreases is lower than the upper bound derived in Theorem \ref{['thm:summary 1']}, but higher than that of ET-$x$ with MSR.
...and 8 more figures

Theorems & Definitions (49)

Theorem 2.3
Theorem 2.4
Theorem 2.5
Definition 4.2: The basic approximation algorithm
Proposition 4.3
Definition 4.4: The general queue length emulation approach
Remark 4.5
Remark 4.6
Definition 4.8: The MSR approximation algorithm
Definition 4.9: The MSR-x approximation algorithm
...and 39 more

Load Balancing Using Sparse Communication

TL;DR

Abstract

Load Balancing Using Sparse Communication

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (49)