Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

S. R. Eshwar; Lucas Lopes Felipe; Alexandre Reiffers-Masson; Daniel Sadoc Menasché; Gugan Thoppe

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

S. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe

TL;DR

A novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues is introduced, offering insights into the effective management of distributed systems.

Abstract

Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

TL;DR

Abstract

Paper Structure (13 sections, 1 theorem, 11 equations, 1 figure)

This paper contains 13 sections, 1 theorem, 11 equations, 1 figure.

Introduction
Model
Problem formulation and solution
Optimization problem
Problem formulation
Optimization problem
Relaxed problem becomes a manageable linear program
A two-timescale stochastic approximation algorithm
Experiments
How does the optimal policy behave?
What are the impacts of different parameters on the optimal policy?
How do the proposed algorithms behave?
Conclusion

Key Result

Lemma 1

Let $s,s' \in \{0,1,...,K\}, a \in \{0,1\}$ and $b \in \{0,1\}$. Then, the transition probabilities $\mathbb{P}(S_n(T_{i+1}) = s' \mid S_n(T_{i}) = s, A_n(T_{i}) = a, B_n(T_{i}) = b)$, denoted by $P_{s,s',a,b}$ are given by: where $P_{s,s',a,b,\tau} := \mathbb{P}(S_n(T_{i+1}) = s' \mid S_n(T_{i}) = s, A_n(T_{i}) = a, B_n(T_{i}) = b, T_{i+1} - T_{i}=\tau).$ Under CJS, Under SJS,

Figures (1)

Figure 1: Analysis of the impact of (a) arrival probability $p$, (b) high service rate budget $\beta$, and (c) job dropping cost $\gamma$ on the solution of LP-based policy \ref{['eq:LP-bandit']}. The figure also shows the impact of the regularization coefficient, $\Gamma$, on the optimal policy.

Theorems & Definitions (1)

Lemma 1

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

TL;DR

Abstract

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)