Power-of-$d$-Choices with Memory: Fluid Limit and Optimality
Jonatha Anselmi, Francois Dufour
TL;DR
This work analyzes a memory-enhanced power-of-$d$-choices load-balancing scheme for a large fleet of parallel servers. By formulating a continuous-time Markov model and deriving its fluid limit, the authors obtain a fixed-point description and prove exponential convergence to a unique fixed point when $\lambda<1-1/d$, yielding asymptotic optimality and tight queue-length bounds. The results hinge on a detailed coupling of stochastic sample paths to a discontinuous drift system, with the memory structure (storing last observations) playing a central role in achieving mean-field optimality. Practically, the memory-augmented scheme delivers improved balance with the same communication burden as traditional SQ($d$), offering robust performance as system size and load vary.
Abstract
In multi-server distributed queueing systems, the access of stochastically arriving jobs to resources is often regulated by a dispatcher, also known as load balancer. A fundamental problem consists in designing a load balancing algorithm that minimizes the delays experienced by jobs. During the last two decades, the power-of-$d$-choice algorithm, based on the idea of dispatching each job to the least loaded server out of $d$ servers randomly sampled at the arrival of the job itself, has emerged as a breakthrough in the foundations of this area due to its versatility and appealing asymptotic properties. In this paper, we consider the power-of-$d$-choice algorithm with the addition of a local memory that keeps track of the latest observations collected over time on the sampled servers. Then, each job is sent to a server with the lowest observation. We show that this algorithm is asymptotically optimal in the sense that the load balancer can always assign each job to an idle server in the large-system limit. This holds true if and only if the system load $λ$ is less than $1-\frac{1}{d}$. If this condition is not satisfied, we show that queue lengths are tightly bounded by $\left\lceil - \frac{ \log (1-λ)}{\log (λd +1)} \right\rceil$. This is in contrast with the classic version of the power-of-$d$-choice algorithm, where at the fluid scale a strictly positive proportion of servers containing $i$ jobs exists for all $i\ge 0$, in equilibrium. Our results quantify and highlight the importance of using memory as a means to enhance performance in randomized load balancing.
