Table of Contents
Fetching ...

Load Balancing Using Sparse Communication

Gal Mendelson, Xu Kuang

TL;DR

This work introduces CARE, a modular model for load balancing under sparse communication, where a load balancer maintains a state approximation of server queues via a dedicated Approximation component and routes with JSAQ. It proposes MSR-based queue emulation and three communication patterns (RT, DT, ET) to bound maximal approximation error while using far less than full state information. The authors establish diffusion-scale results (SDDP) showing that, for bounded approximation error, the system achieves asymptotically optimal workload and near-optimal delay, linking communication sparsity to performance rigorously. Simulations demonstrate substantial communication reductions (up to ~90%) with competitive or superior performance compared with JSQ, SQ(2), and RR, guiding practical design decisions for data-center-like systems.

Abstract

Load balancing across parallel servers is an important class of congestion control problems that arises in service systems. An effective load balancer relies heavily on accurate, real-time congestion information to make routing decisions. However, obtaining such information can impose significant communication overheads, especially in demanding applications like those found in modern data centers. We introduce a framework for communication-aware load balancing and design new load balancing algorithms that perform exceptionally well even in scenarios with sparse communication patterns. Central to our approach is state approximation, where the load balancer first estimates server states through a communication protocol. Subsequently, it utilizes these approximate states within a load balancing algorithm to determine routing decisions. We demonstrate that by using a novel communication protocol, one can achieve accurate queue length approximation with sparse communication: for a maximal approximation error of x, the communication frequency only needs to be O(1/x^2). We further show, via a diffusion analysis, that a constant maximal approximation error is sufficient for achieving asymptotically optimal performance. Taken together, these results therefore demonstrate that highly performant load balancing is possible with very little communication. Through simulations, we observe that the proposed designs match or surpass the performance of state-of-the-art load balancing algorithms while drastically reducing communication rates by up to 90%.

Load Balancing Using Sparse Communication

TL;DR

This work introduces CARE, a modular model for load balancing under sparse communication, where a load balancer maintains a state approximation of server queues via a dedicated Approximation component and routes with JSAQ. It proposes MSR-based queue emulation and three communication patterns (RT, DT, ET) to bound maximal approximation error while using far less than full state information. The authors establish diffusion-scale results (SDDP) showing that, for bounded approximation error, the system achieves asymptotically optimal workload and near-optimal delay, linking communication sparsity to performance rigorously. Simulations demonstrate substantial communication reductions (up to ~90%) with competitive or superior performance compared with JSQ, SQ(2), and RR, guiding practical design decisions for data-center-like systems.

Abstract

Load balancing across parallel servers is an important class of congestion control problems that arises in service systems. An effective load balancer relies heavily on accurate, real-time congestion information to make routing decisions. However, obtaining such information can impose significant communication overheads, especially in demanding applications like those found in modern data centers. We introduce a framework for communication-aware load balancing and design new load balancing algorithms that perform exceptionally well even in scenarios with sparse communication patterns. Central to our approach is state approximation, where the load balancer first estimates server states through a communication protocol. Subsequently, it utilizes these approximate states within a load balancing algorithm to determine routing decisions. We demonstrate that by using a novel communication protocol, one can achieve accurate queue length approximation with sparse communication: for a maximal approximation error of x, the communication frequency only needs to be O(1/x^2). We further show, via a diffusion analysis, that a constant maximal approximation error is sufficient for achieving asymptotically optimal performance. Taken together, these results therefore demonstrate that highly performant load balancing is possible with very little communication. Through simulations, we observe that the proposed designs match or surpass the performance of state-of-the-art load balancing algorithms while drastically reducing communication rates by up to 90%.
Paper Structure (60 sections, 20 theorems, 183 equations, 13 figures)

This paper contains 60 sections, 20 theorems, 183 equations, 13 figures.

Key Result

Theorem 2.3

For every $x \in \{2,3,4,\ldots\}$ there exists a combination of an approximation algorithm and communication pattern under which $AQ(t)\leq x-1$ and $M(t)\leq \frac{1}{x}D(t)$, for all $t\geq 0$. Specifically, this holds for DT-$x$ or ET-$x$ with the basic or MSR-x approximation algorithms.

Figures (13)

  • Figure 1: A load balancer routes incoming jobs to servers working in parallel.
  • Figure 4: The CARE model in the context of load balancing with parallel servers.
  • Figure 5: Comparison of the communications rates of different load balancing architectures in a system with $K$ servers. Communication rates are measured in number of messages per arrival. (D) denotes the rate under the implementation where servers notify the load balancer upon job departures, while (A) denotes the implementation where the load balancer queries servers upon each arrival. The parameter $x$ controls the quality of approximation, and can be roughly interpreted as the maximum error tolerance in the queue length approximations.
  • Figure 6: The communication requirement of ET-$x$ with MSR for different loads and maximal approximation error $y$ ($=x-1$), relative to the communication required for full state information. The required communication decreases quadratically with $y$, and is lower than the upper bound derived in Theorem \ref{['thm:summary 3']}.
  • Figure 7: The communication requirement of ET-$x$ with MSR-$x$ for different loads and maximal approximation error $y$ ($=x-1$), relative to the communication required for full state information. The required communication decreases is lower than the upper bound derived in Theorem \ref{['thm:summary 1']}, but higher than that of ET-$x$ with MSR.
  • ...and 8 more figures

Theorems & Definitions (49)

  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Definition 4.2: The basic approximation algorithm
  • Proposition 4.3
  • Definition 4.4: The general queue length emulation approach
  • Remark 4.5
  • Remark 4.6
  • Definition 4.8: The MSR approximation algorithm
  • Definition 4.9: The MSR-x approximation algorithm
  • ...and 39 more